Why don't you see binary code when you open a binary file with text editor?



Why don't you see binary code when you open a binary file with text editor? For example, when I open an image with text editor, I see some weird characters and also some human readable characters; but the image should be encoded in binary.


Posted 2011-12-18T16:33:20.063

Reputation: 857

Look up hexdump. You don't view binary, you view hexadecimal. It's the nearest to human readable you will get. The text editor tries to display it in the nearest facsimile to human readable text. It's guessing so you get to view garbage. To view the contents, use a hex editor which shows the file in hexadecimal pairs (byte) and can allow you to edit the file. – Fiasco Labs – 2011-12-18T17:55:24.903

8What is it you expected? How do you think this should be archived? – Nikodemus RIP – 2011-12-18T21:40:42.433

2I wonder why more editors don't offer to the binary as raw ASCII 1/0 sequences. – Xeoncross – 2011-12-18T23:45:48.587

7@Xenocross: because the raw 0/1 sequence is useless, it's too unwieldy for manual decoding because they takes a huge amount of screen space; hex display is generally more superior for manual decoding. And with some training, you can translate hex to binary and vice versa quickly and easily. – Lie Ryan – 2011-12-19T10:16:53.943

As Lie Ryan said. Anyone who's been involved in Machine Language Programmming knows that the base unit is the byte, 8 bits, easily represented as two hex numbers. The only time binary becomes useful is if you're dealing with flag bits. The rest of the time, registers are transferring data in a minimum of 8 bit chunks, even if it is 64 bits. Binary becomes goofy and unwieldy at this point. – Fiasco Labs – 2011-12-23T03:27:41.607

3@Fiasco Labs: Pedantry: one hex number with two digits - 00 to FF, which translates to decimal 0 - 255 (8 bits representing 2^8 = 256 possible states). – Piskvor left the building – 2012-01-30T15:09:48.803

1@Piskvor - Thanks for putting it better than I did. deadbeef is a hex number with 8 digits, for the record. ;^) – Fiasco Labs – 2012-01-31T06:41:57.270



Binary and text data aren't separated: They are simply data. It depends on the interpretation that makes them one or the other. If you open binary data (such as an image file) in a text editor, much of it won't make sense, because it does not fit your chosen interpretation (as text).

What you call text is a subset of the possible file contents: Data that in a given character set translates to readable characters.

For example, in ASCII, you can see that, of 128 "allowed" values, only about half are letters and numbers, 30 are punctuation, and the rest are control characters. The latter group just isn't used a lot in text files, and they have no really good textual representation. Some of them are Tab and Newline characters, where text editors already need to get creative in displaying them.

Some text editors have options to explicitly display whitespace. Then they'll actually be drawn as characters, in addition to their regular formatting behavior (which is also just the interpretation of these characters).

Pure ASCII only interprets 128 values. The bytes used to store this information have 256 possible values each, so half of the possible values aren't allowed in ASCII. Those are e.g. used in region-specific character sets, such as Latin 1, but in ASCII, they're undefined. They have no useful representation in a text viewer that can only handle ASCII.

Binary data is not usually interpreted as text. So in these files, all possible byte values are commonly found. Everything else would be wasteful (and that's a reason you can compress text very well). Image file formats are complicated, and you don't usually view them as text, so they don't need to be readable.

As there is no common data interpretation (character set) that maps all possible values to readable characters, and since that wouldn't make lot of sense anyway (as it's not readable text), major parts are displayed as gibberish.

A hex editor chooses a different representation for the data: It displays each byte as two hexadecimal digits. It's just a different representation, and one with an easily human-readable character set: All 256 possible byte values can be represented as two hex digits.

Since there's an easy mapping of binary data to hex and vice versa (4 binary digits to/from one hexadecimal digit), and binary contains very little information per digit, hexadecimal is generally the preferred way for humans to read binary, unless there are specific reasons to prefer a different representation.

Some text editors might have a hex editor mode and some heuristic that tried to determine whether a file is text or binary, and automatically select one mode or the other. But this can be difficult to get right and it's not a specific property of the file that says whether it's one kind or the other.

Some FTP clients ask you to specify which file endings are used for text data. These programs will then change the file contents to match the OS of the machine you're connected to, as Windows uses a different line ending character sequence (CR/LF) than Linux and Unix (including Mac OS X; LF).

Daniel Beck

Posted 2011-12-18T16:33:20.063

Reputation: 98 421

4Ughh, the LF has bitten me more times than I care to remember. – surfasb – 2011-12-18T19:20:15.477


Because you've opened it in a text editor, not a binary editor.

Ignacio Vazquez-Abrams

Posted 2011-12-18T16:33:20.063

Reputation: 100 516

22As you've seen, text. – Ignacio Vazquez-Abrams – 2011-12-18T16:38:38.563

1Text as a representation of hexadecimal numbers (0-f) arranged in pairs (bytes). If you want binary, convert the hex to binary in a senseless string of zeros and ones. Hex is more human readable and easier to make sense of. – Fiasco Labs – 2011-12-18T17:59:09.343

2Gotta say that: someone should take the bold step to put out a real-binary editor, with Ones and Zeros, (and then maybe separate panes with related hex/char/dec translitterations) for the sole purpose of teaching this kind of stuff. I know they shouldn't, but popular media, and math teachers pretending to know computers, set all expectations wrong for eager kids willing to learn. – ZJR – 2011-12-19T00:55:36.870

@ZJR: No reason they shouldn't. Many hex editors do let you view file contents in binary. Programmers just don't generally find it as useful as the hexadecimal view, so you don't hear about it as much. – David Z – 2012-01-30T19:02:59.127


It's all to do with context and interpretation. What's in your computer is patterns of high and low voltage, or magnetised regions of a disk, that only gain meaning when we decide how we want to interpret them.

Under different circumstances, the pattern low-high-low-low-low-low-low-high might mean the number 65, a capital letter 'A', a sky-blue colour, that a customer ordered coffee, the date 'March 6th' or anything at all, really.

When you open your image file in a graphics program, it knows to interpret it as an image, knows which patterns indicate the image format, which patterns indicate the image size and so on.

When you open your image file in a text editor, it gets treated as text. This is a very simple format, much closer to what's really going on in the computer, but there is still some interpretation going on. Specifically, nearly every pattern gets interpreted as a particular character, some normal like A-Z, but also some weird characters. A few patterns don't show up as characters but instead are treated as basic formatting: newline, tab.

(The situation is slightly complicated by things such as Unicode and text encodings such as UTF-8 but I won't deal with those here for the sake of simplicity.)

When you have an binary file open in a text editor, take care not to make changes, because almost any change you make will completely disrupt the normal interpretation of the file's contents, that is it will ruin the file and make it unusable.

Andrew Turner

Posted 2011-12-18T16:33:20.063

Reputation: 316


As a simplified example, consider an image file opened with a text editor.

The image is a simple chess pattern, with the squares 3 pixels wide and a 1-pixel gray border between each square. - three black pixels, a grey border pixel, three white pixels, a grey border pixel, repeat.

The first line in that image would have the following value four times:

Black    Black    Black    Gray     White    White    White    Gray
0x000000 0x000000 0x000000 0x7F7F7F 0xFFFFFF 0xFFFFFF 0xFFFFFF 0c7F7F7F

(In Hex, rather than Binary - the string in Binary would be four times as long - 0x7F being replaced with 0b01111111)

If you load that string of data in a text editor, you would get the following text:


This is because 0x00 is the ASCII code for the Null value and you need to write that 3 times to get the value for a black pixel (In 24bit BMP anyway) and you have 3 black pixels. Then 0x7F is the ASCII code for Delete, and you need THAT three times to get a gray pixel. 0xFF isn't valud ASCII code for anything in particular - even in the extended ASCII set - and you need to write it 9 times to get 3 white pixels. Finishing it off, you get three more Deletes to write a gray pixel.

A different way to show it, which might be more usefully explanatory, is the reverse example - what DO you have to write to a file in order to get zeroes and ones when opened in a text editor?

The ASCII codes for zero and one, of course! A zero in a text editor isn't stored as a single bit with value 0, it is stored as 8 bits with value 0b00110000, or in hex 0x30

The ASCII code for '0' is 0x30, and the ASCII code for '1' is 0x31, so if you want to store a chess pattern as zeroes and ones, your file will look like this:

text editor:

Stored data (ASCII values for '1', '0' and 'new line'):
0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x0D 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x0D 0x31 0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x0D  0x30 0x31 0x30 0x31 0x30 0x31 0x30 0x31

There is a lot more to it than this - files have starts and stops and metadata and all other kinds of things, but the takehome lesson and answer to your question is:

Unless the first 8 bits of your file are 0b00110000, your text editor will not write '0' because that's the ASCII-code for the character '0'. Unless the first 8 bits ouf your file are 0b00110001, your text editor will not write '1' because that's the ASCII-code for the character '1'.


Posted 2011-12-18T16:33:20.063

Reputation: 143


The editor is not smart enough to figure whether some text makes sense or not, so it displays any file as text unless specifically told to do otherwise, if it has that feature. As others pointed out, some editors have the feature of displaying hex.

Emilio M Bumachar

Posted 2011-12-18T16:33:20.063

Reputation: 235

UltraEdit is smart enough - it switches to hex edit mode for such files. – Peter Mortensen – 2017-06-11T12:46:52.837