I went down a rabbit-hole after the other answers failed me, and managed to figure out that my version of tar (1.27.1 from the openSUSE 42.3 OSS repo) was using the non-deterministic pax
archival format by default, which means that even without compression, (and even setting the mtime explicitly) archives created with tar from the same files would differ:
$ echo hi > test.file
$ tar --create --to-stdout test.file # long form of `tar cO test.file`
./PaxHeaders.13067/test.file0000644000000000000000000000013213427447703012603 xustar0030 mtime=1549684675.835011178
30 atime=1549684726.410510251
30 ctime=1549684675.835011178
test.file0000644000175000001440000000000313427447703013057 0ustar00hartusers00000000000000hi
$ tar --create --to-stdout test.file
./PaxHeaders.13096/test.file0000644000000000000000000000013213427447703012605 xustar0030 mtime=1549684675.835011178
30 atime=1549684726.410510251
30 ctime=1549684675.835011178
test.file0000644000175000001440000000000313427447703013057 0ustar00hartusers00000000000000hi
Note that the output above differs, even though no compression is being used; the uncompressed archive contents (generated by running tar twice on the same contents) are different, so the compressed content will also differ even when using GZIP=-n
as other answers suggest
In order to get around this, you can specify --format gnu
:
$ tar --create --format gnu --to-stdout test.file
test.file0000644000175000001440000000000313427447703011557 0ustar hartusershi
$ tar --create --format gnu --to-stdout test.file
test.file0000644000175000001440000000000313427447703011557 0ustar hartusershi
This works with the suggestion about gzip above:
# gzip refuses to write to stdout, so we'll use the `-f` option to create a file
$ GZIP=-n tar --format gnu -czf test.file.tgz test.file && md5sum test.file.tgz
0d8c7b3bdbe8066b516e3d3af60ade75 test.file.tgz
$ GZIP=-n tar --format gnu -czf test.file.tgz test.file && md5sum test.file.tgz
0d8c7b3bdbe8066b516e3d3af60ade75 test.file.tgz
# without GZIP=-n we see a different hash
$ tar --format gnu -czf test.file.tgz test.file && md5sum test.file.tgz
682ce0c8267b90f4103b4c29903c5a8d test.file.tgz
However, in addition to valid reasons to prefer better compression formats to gzip, you might want to consider using xz instead (which tar also supports with the --xz
or -J
flags instead of -z
), because it saves you a step here; the default behaviour of xz
is to generate the same compressed output when the uncompressed contents are the same, so there's no need to specify an option like GZIP=-n
:
$ tar --format gnu --xz -cf test.file.txz test.file && md5sum test.file.txz
dea99037d4b0ee4565b3639e93ac0930 test.file.txz
$ tar --format gnu --xz -cf test.file.txz test.file && md5sum test.file.txz
dea99037d4b0ee4565b3639e93ac0930 test.file.txz