What exactly is a binary file? Is a JPEG file (.jpeg) a binary file?

5

Sorry if this question seems simplistic or overly broad.

I would like to clarify what a binary file is. I know that a binary file is a binary encoded file.

Is a file format like JPEG classified as a being a binary file?

Wikipedia simply states that a binary file is any binary encoded file for computerized storage / processing and that anything wholly text based is regarded as a plain-text file, that is, not a binary file.

wulfgarpro

Posted 2011-07-09T12:08:08.280

Reputation: 173

Wikipedia actually states: A binary file is a computer file that is not a text file. – mouviciel – 2016-09-21T11:10:50.883

Yes, mine was a paraphrase. – wulfgarpro – 2016-09-21T12:43:54.360

Answers

7

Well, you understand that every file that has content is a binary file, every single one without exception, including a file with a .txt extension.

The one and only difference between a binary file with a .txt extension and one with a .jpg extension is really a meta difference: convention and historical practice tell us that we can make assumptions about the first file:

  1. it is to be interpreted as a collection of contiguous 8-bit fields;
  2. each such field represents an ASCII character; and
  3. most important, there are no control fields -- no counts, no state-change indicators, none of that.

Otherwise, there's no difference between what we -- only by convention -- call a text file and any other file.

Furthermore, there is no way to know how a file should be interpreted just by looking at its contents. We have to depend upon something external to the file -- like its extension, say -- to give us a hint at what the thing is.

Pete Wilson

Posted 2011-07-09T12:08:08.280

Reputation: 186

Many text files have extensions other than .txt (e.g., .html) or no extension at all (e.g., README). – mouviciel – 2016-09-21T11:07:27.707

2Text files encompass far more than single-byte ASCII-encoded characters. – kreemoweet – 2016-09-21T11:11:57.947

Many file formats include "in-band" identifying strings or "magic numbers" that can provide hints on the file type as well. So reading the file and looking for these hints can reveal the file type with a degree of probability. The only thing really enforcing file types based on the content within is the programs that read and write them, not the OS or filesystem. – LawrenceC – 2013-12-16T20:10:07.720

6

I would describe this to my mom (hope neither of you take offense to this) -- is that any file that is contains gibberish when opened in notepad is a binary file.

When I refer to binaries at work, they're typically outputs of the compiler. These may have readable text embedded inside, but still considered binaries.

A JPEG is a binary file.

UPDATE:

The distinction becomes more important with FTP, where you are in ASCII or Binary transfer mode. This has to do with interpreting the line endings (NL versus CRLF) for multiple systems. You wouldn't want to modify a JPEG that uses the newline code as this risks corruption.

jglouie

Posted 2011-07-09T12:08:08.280

Reputation: 669

This question stems from storage of binary files in a mongodb collection. MongoDB supports storage of binary files - would you classify a JPEG as a binary file? – wulfgarpro – 2011-07-09T12:15:19.697

1Yes. The JPEG spec details how to encode and decode the image. Even a simpler format though -- say a raw Bitmap -- is also a Binary file. In the Bitmap case, it has a small header and then a color code for every pixel in the image. The JPEG involves a few extras such as quantization tables. In the end they're both binary but the respective specs say how the binary should be interpreted to produce an image – None – 2011-07-09T12:24:11.660

The JPEG format is a specification for encoding images as a series of bytes which don't tend to make sense when viewed with Notepad (or any text editor), as @jglouie said. – pavium – 2011-07-09T12:25:33.040

Thanks guys. This is what I understood - just wanted to clarify it. – wulfgarpro – 2011-07-09T12:27:44.963

1Encrypted text is an exception to the "gibberish" rule. But I still like this description. :) – None – 2011-07-09T12:29:00.067