How do I compare binary files in Linux?

324

119

I need to compare two binary files and get the output in the form:

<fileoffset-hex> <file1-byte-hex> <file2-byte-hex>

for every different byte. So if file1.bin is

  00 90 00 11

in binary form and file2.bin is

  00 91 00 10

I want to get something like

  00000001 90 91
  00000003 11 10

Is there a way to do this in Linux? I know about cmp -l but it uses a decimal system for offsets and octal for bytes which I would like to avoid.

frustratedCmpNoLongerUser

Posted 2010-03-29T15:28:57.730

Reputation:

9you're basically looking for "binary diff". i can imagine some reeeally ugly commandline one-liner with od... – quack quixote – 2010-03-29T15:36:42.660

2@quack quixote: What's ugly about a one-liner? ;) – Bobby – 2010-03-29T16:50:23.907

Because you can't answer this question (as you're not a user), I'm voting to close. A binary diff as explicitly requested here isn't at all useful, and I'm inclined to think you want something useful, if you insert one byte at the start of the file should all bytes be marked as being different? Without knowing that, this is simply too vague.

– Evan Carroll – 2018-11-09T04:02:18.550

Not to mention this is explicitly against the rules on multiple areas, it's about "programming and software development" and you're asking for a product or recommendation rather than how to use a specific product. – Evan Carroll – 2018-11-09T04:04:38.873

Also updated with the method about radare, but I still think this question is both off topic and too vague.

– Evan Carroll – 2018-11-09T04:20:18.950

2@EvanCarroll If you think the question is off topic why are you answering it? – DavidPostill – 2018-11-26T21:07:51.903

@DavidPostill I don't think it's off topic. I think it's a great question. I think it's poorly worded and that the admins here would cause undue problems if I otherwise tried to salvage it. See my answer for more information. Binary diff question? YES! Byte for byte diff? Well, that makes no sense in any use case I can imagine. – Evan Carroll – 2018-11-26T22:35:11.813

FreeBSD's cmp has an -x flag ("heXadecimal") which produces output formatted exactly as specified in the question, in conjunction with -l: cmp -xl file1.bin file2.bin. source

– Conrad Meyer – 2019-11-23T18:01:38.913

xdelta.org works quite well. Perhaps it'd be worth having a look at it. – thatjuan – 2013-10-10T05:31:23.123

Answers

182

This will print the offset and bytes in hex:

cmp -l file1.bin file2.bin | gawk '{printf "%08X %02X %02X\n", $1, strtonum(0$2), strtonum(0$3)}'

Or do $1-1 to have the first printed offset start at 0.

cmp -l file1.bin file2.bin | gawk '{printf "%08X %02X %02X\n", $1-1, strtonum(0$2), strtonum(0$3)}'

Unfortunately, strtonum() is specific to GAWK, so for other versions of awk—e.g., mawk—you will need to use an octal-to-decimal conversion function. For example,

cmp -l file1.bin file2.bin | mawk 'function oct2dec(oct,     dec) {for (i = 1; i <= length(oct); i++) {dec *= 8; dec += substr(oct, i, 1)}; return dec} {printf "%08X %02X %02X\n", $1, oct2dec($2), oct2dec($3)}'

Broken out for readability:

cmp -l file1.bin file2.bin |
    mawk 'function oct2dec(oct,    dec) {
              for (i = 1; i <= length(oct); i++) {
                  dec *= 8;
                  dec += substr(oct, i, 1)
              };
              return dec
          }
          {
              printf "%08X %02X %02X\n", $1, oct2dec($2), oct2dec($3)
          }'

Paused until further notice.

Posted 2010-03-29T15:28:57.730

Reputation: 86 075

Why not simply compare the sha256sum of both files? – Rodrigo – 2019-08-21T01:13:05.717

1@Rodrigo: That and various other methods will just show whether the files differ. My answer meets the OP's requirement to actually show what the differences are. – Paused until further notice. – 2019-08-21T03:01:55.367

Of course! Sorry, I was so worried about MY problem that I barely read the OP's. Thank you. – Rodrigo – 2019-08-21T14:37:17.297

The advantage of cmp over the answer with xxd is that it's orders of magnitude faster on large files! – Ruslan – 2019-11-25T05:33:11.143

3@gertvdijk: strtonum is specific to GAWK. I believe Ubuntu previously used GAWK as the default, but switched at some point to mawk. In any case, GAWK can be installed and set to the default (see also man update-alternatives). See my updated answer for a solution that doesn't require strtonum. – Paused until further notice. – 2013-07-04T18:08:49.700

174

As ~quack pointed out:

 % xxd b1 > b1.hex
 % xxd b2 > b2.hex

And then

 % diff b1.hex b2.hex

or

 % vimdiff b1.hex b2.hex

akira

Posted 2010-03-29T15:28:57.730

Reputation: 52 754

1This worked great for me (with opendiff on OS X instead of vimdiff) — the default view xxd provides keeps the diff engine on track comparing byte-by-byte. With plain (raw) hex simply column-fit with fold, diff would try to fold/group random stuff in the files I was comparing. – natevw – 2014-11-15T23:26:29.600

1

This command does not work well for byte addition removal, as every line that follows will be misaligned and seen as modified by diff. The solution is to put 1 byte per line and remove the address column as proposed by John Lawrence Aspden and me.

– Ciro Santilli 新疆改造中心法轮功六四事件 – 2015-04-04T20:38:25.557

73In Bash: diff <(xxd b1) <(xxd b2) but the output format of this (or yours) is nowhere near what the OP asked for. – Paused until further notice. – 2010-03-29T16:33:20.733

7with vimdiff it is, it will color the bytes in the lines where the two 'files' differ – akira – 2010-03-30T04:45:50.483

Aww, why didn't I think of that? And I'm sure I've used this technique in the past too. – njd – 2010-03-30T17:37:44.537

Nice. I'm on an embedded system that uses BusyBox and there is no cmp, but hexdump + diff works like a charm. – Robert Calhoun – 2013-03-22T03:47:18.533

112

diff + xxd

Try diff in the following combination of zsh/bash process substitution:

diff -y <(xxd foo1.bin) <(xxd foo2.bin)

Where:

  • -y shows you differences side-by-side (optional).
  • xxd is CLI tool to create a hexdump output of the binary file.
  • Add -W200 to diff for wider output (of 200 characters per line).
  • For colors, use colordiff as shown below.

colordiff + xxd

If you've colordiff, it can colorize diff output, e.g.:

colordiff -y <(xxd foo1.bin) <(xxd foo2.bin)

Otherwise install via: sudo apt-get install colordiff.

Sample output:

binary file output in terminal - diff -y <(xxd foo1.bin) <(xxd foo2.bin) | colordiff

vimdiff + xxd

You can also use vimdiff, e.g.

vimdiff <(xxd foo1.bin) <(xxd foo2.bin)

Hints:

  • if files are too big, add limit (e.g. -l1000) for each xxd

kenorb

Posted 2010-03-29T15:28:57.730

Reputation: 16 795

11Command can be simplified as colordiff -y <(xxd foo1.bin) <(xxd foo2.bin). – golem – 2015-11-17T22:46:01.683

3If you don't have colordiff, this will do the same thing without colors: diff -y <(xxd foo1.bin) <(xxd foo2.bin) – Rock Lee – 2016-08-04T15:25:39.617

6If you just want to know whether both files are actually the same, you can use the -q or --brief switch, which will only show output when the files differ. – Stefan van den Akker – 2016-10-08T11:14:55.107

1create a function xxddiff for this with: xxddiff() ( f() ( xxd "$1" ; ); diff -y <(f "$1") <(f "$2") | colordiff; ) – rubo77 – 2016-11-14T06:35:36.367

2great! still, diff -u <(xxd tinga.tgz) <(xxd dec.out.tinga.tgz) | vim - will do a job good enoug – ribamar – 2018-06-01T14:27:57.637

2My favorite solution, helped me a lot! With option --suppress-common-lines only different lines will be displayed – ololobus – 2019-06-13T11:20:48.280

Awesome! Extra kudos for the <( syntax. One never stops learning! – Kevin Keane – 2019-08-09T18:07:31.797

1Caution! colordiff truncates the width of each column. On my system, this leads to a subtly "wrong" output. Very annoying if you do binary comparison. Use the -W option with a sufficiently large number (140 was enough to work with xxd) to avoid that pitfall. – pasbi – 2020-01-07T15:32:52.940

60

There's a tool called DHEX which may do the job, and there's another tool called VBinDiff.

For a strictly command-line approach, try jojodiff.

njd

Posted 2010-03-29T15:28:57.730

Reputation: 9 743

2vbindiff lets us actually edit the file, thx! – Aquarius Power – 2014-09-16T18:28:24.380

+1 But why do they say in jdiff -h: "Do not use jdiff directly on compressed files, such as zip, gzip, rar, ..." ?? – 1111161171159459134 – 2015-05-22T13:20:29.870

2@DanielBeauyat compressed files will be completely different after you encounter the first different byte. The output is not likely to be useful. – Mark Ransom – 2015-08-05T03:12:37.113

2@1111161171159459134 jdiff is part of a "suite" of programs to sync and patch the differences found by jdiff. But, as Mark Ransom said, that would be generally not wise on compressed files; the exception is "synchronizable" compressed formats (like that produced by gzip --rsyncable), in which small differences in the uncompressed files should have a limited effect on the compressed file. – hmijail mourns resignees – 2016-02-13T11:12:08.747

I was happy to find vbindiff in the Debian repos. – Jonathon Reinhart – 2017-12-15T16:09:14.350

vbindiff not able to add/delete byte, only can edit existing byte. – 林果皞 – 2018-06-12T17:18:04.537

10DHEX is awesome is comparing binaries is what you want to do. Feed it two files and it takes you right to a comparative view, highlighting to differences, with easy ability to move to the next difference. Also it's able to work with large terminals, which is very useful on widescreen monitors. – Marcin – 2011-09-08T00:08:15.810

7I prefer VBinDiff. DHEX is using CPU even when idling, I think it's redrawing all the time or something. VBinDiff doesn't work with wide terminals though. But the addresses become weird with wide terminals anyway, since you have more than 16 bytes per row. – Janus Troelsen – 2012-10-17T14:22:34.443

28

Method that works for byte addition / deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

Output:

64d63
<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  )
  diff <(f "$1") <(f "$2")
)

bdiff file1 file2

Output:

64d63
<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

  • it is POSIX, xxd is not (comes with Vim)
  • has the -An to remove the address column without awk.

Command explanation:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: https://stackoverflow.com/questions/8987257/concatenating-every-other-line-with-the-next
  • we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: https://stackoverflow.com/questions/8426077/how-to-define-a-function-inside-another-function-in-bash

See also:

Ciro Santilli 新疆改造中心法轮功六四事件

Posted 2010-03-29T15:28:57.730

Reputation: 5 621

14

Short answer

vimdiff <(xxd -c1 -p first.bin) <(xxd -c1 -p second.bin)

When using hexdumps and text diff to compare binary files, especially xxd, the additions and removals of bytes become shifts in addressing which might make it difficult to see. This method tells xxd to not output addresses, and to output only one byte per line, which in turn shows exactly which bytes were changed, added, or removed. You can find the addresses later by searching for the interesting sequences of bytes in a more "normal" hexdump (output of xxd first.bin).

Evgeny

Posted 2010-03-29T15:28:57.730

Reputation: 856

(Of course, one may use diff instead of vimdiff.) – VasyaNovikov – 2015-12-15T17:35:36.597

11

I'd recommend hexdump for dumping binary files to textual format and kdiff3 for diff viewing.

hexdump myfile1.bin > myfile1.hex
hexdump myfile2.bin > myfile2.hex
kdiff3 myfile1.hex myfile2.hex

BugoK

Posted 2010-03-29T15:28:57.730

Reputation: 111

2Even here in bash kdiff3 <(hexdump myfile1.bin) <(hexdump myfile2.bin) with no need to create files myfile1.hex and myfile2.hex. – Hastur – 2016-01-25T14:34:14.817

6

The hexdiff is a program designed to do exactly what you're looking for.

Usage:

hexdiff file1 file2

It displays the hex (and 7-bit ASCII) of the two files one above the other, with any differences highlighted. Look at man hexdiff for the commands to move around in the file, and a simple q will quit.

Mick

Posted 2010-03-29T15:28:57.730

Reputation: 121

4But it does a pretty bad job when it comes to the comparing part. If you insert some bytes into a file, it will mark all byte afterwards as changes – Murmel – 2016-04-27T19:43:49.960

and hexdiff is not available via apt-get on Ubuntu 16.4 – rubo77 – 2016-11-14T06:38:12.377

1@Murmel while I agree, isn't that what's being asked here? – Evan Carroll – 2018-11-09T04:00:45.823

@EvanCarroll true, and hence I left a comment (only) and did not downvote – Murmel – 2018-11-09T17:52:53.407

I also didn't down vote Mick, but I agree with you and answered here https://superuser.com/a/1373977/11116 because it seems likely that this bad question will get reformed or closed.

– Evan Carroll – 2018-11-09T19:06:07.317

4

It may not strictly answer the question, but I use this for diffing binaries:

gvim -d <(xxd -c 1 ~/file1.bin | awk '{print $2, $3}') <(xxd -c 1 ~/file2.bin | awk '{print $2, $3}')

It prints both files out as hex and ASCII values, one byte per line, and then uses Vim's diff facility to render them visually.

John Lawrence Aspden

Posted 2010-03-29T15:28:57.730

Reputation: 713

1

You can use gvimdiff tool that is included in vim-gui-common package

sudo apt-get update

sudo apt-get install vim-gui-common

Then you can compare 2 hex files using following commands :

ubuntu> gvimdiff <hex-file1> <hex-file2>

Tha's all. Hope tha help !

craken

Posted 2010-03-29T15:28:57.730

Reputation: 111

1

The firmware analysis tool binwalk also has this as a feature through its -W/--hexdump command line option which offers options such as to only show the differing bytes:

    -W, --hexdump                Perform a hexdump / diff of a file or files
    -G, --green                  Only show lines containing bytes that are the same among all files
    -i, --red                    Only show lines containing bytes that are different among all files
    -U, --blue                   Only show lines containing bytes that are different among some files
    -w, --terse                  Diff all files, but only display a hex dump of the first file

In OP's example when doing binwalk -W file1.bin file2.bin:

binwalk -W file1.bin file2.bin

phk

Posted 2010-03-29T15:28:57.730

Reputation: 255

0

dhex http://www.dettus.net/dhex/

DHEX is a more than just another hex editor: It includes a diff mode, which can be used to easily and conveniently compare two binary files. Since it is based on ncurses and is themeable, it can run on any number of systems and scenarios. With its utilization of search logs, it is possible to track changes in different iterations of files easily.

Vincent Vega

Posted 2010-03-29T15:28:57.730

Reputation: 1

Welcome to SuperUser! Although this software looks like it could solve the OP's problem, pure advertisement is strongly frowned upon on the Stack Exchange network. If you are affiliated to this software's editor, please disclose this fact. And try to rewrite your post so that it looks less like a commercial. Thank you. – Nathan.Eilisha Shiraini – 2017-08-18T13:31:56.987

I am not affiliated with dhex in any way. I copied the author's description into the post because there is minimum post length limit – Vincent Vega – 2017-08-19T13:59:44.507

Already mentioned at: https://superuser.com/a/125390/128124

– Ciro Santilli 新疆改造中心法轮功六四事件 – 2017-09-07T08:36:46.180

-1

https://security.googleblog.com/2016/03/bindiff-now-available-for-free.html

BinDiff is a great UI tool for comparing binary files that has been open sourced recently.

Evgeny

Posted 2010-03-29T15:28:57.730

Reputation: 856

3Can it be used on arbitrary binary files, though? That page seems to indicate that it's only useful for comparing executables that have been disassembled by Hex-Rays IDA Pro. – eswald – 2016-04-29T22:57:46.117

-1

The go to open source product on Linux (and everything else) is Radare which provides radiff2 explicitly for this purpose. I voted to close this because myself and others have the same question, in the question you ask

for every different byte

That's insane though. Because as asked, if you insert one byte at the first byte in the file, you'd find every subsequent byte was different and so the diff would repeat the whole file, for an actual difference of one byte.

Slightly more practical is radiff -O. The -O is for ""Do code diffing with all bytes instead of just the fixed opcode bytes""

0x000000a4 0c01 => 3802 0x000000a4
0x000000a8 1401 => 3802 0x000000a8
0x000000ac 06 => 05 0x000000ac
0x000000b4 02 => 01 0x000000b4
0x000000b8 4c05 => 0020 0x000000b8
0x000000bc 4c95 => 00a0 0x000000bc
0x000000c0 4c95 => 00a0 0x000000c0

Like IDA Pro, Radare is a tool primary for binary analysis, you can also show delta diffing with -d, or display the disassembled bytes instead of hex with -D.

If you're asking these kind of questions though, check out

Evan Carroll

Posted 2010-03-29T15:28:57.730

Reputation: 1