7

I frequently need to do a backup of a group of files, with many subfolders which contain several large, identical files.

Is there a compression scheme (.zip, .7z, etc) which can automatically detect this and not store identical files more than once?

Warpin
  • 173
  • 1
  • 3

4 Answers4

6

I also just went through this too.

If you compress your files into a Tar Ball, 7z's LZMA compression may or may not recognise the duplicates if they are separated too far in the Tar Ball (it's a function of Dictionary Size and a few other things).

7z has a WIM format which collects duplicates, then you can use normal LZMA compression for there. Windows Command Line Example:

7z a -twim "Example.wim" *
7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on -mhc=on "Example.7z" "Example.wim"
del "Example.wim"

It works well, give it a go.

ALM865
  • 161
  • 1
  • 3
3

I suggest 3 options that I've tried (in Windows):

  1. 7zip LZMA2 compression with dictionary size of 1536Mb
  2. WinRar "solid" file
  3. 7zip WIM file

I had 10 folders with different versions of a web site (with files such as .php, .html, .js, .css, .jpeg, .sql, etc.) with a total size of 1Gb (100Mb average per folder). While standard 7zip or WinRar compression gave me a file of about 400/500Mb, these options gave me a file of (1) 80Mb, (2) 100Mb & (3) 170Mb respectively.

0

You can use FastPack for this: https://github.com/QuanosSolutions/FastPack

Scordo
  • 101
0

Yes, it's possible: https://superuser.com/questions/479074/why-doesnt-gzip-compression-eliminate-duplicate-chunks-of-data

Here's an example I came up with:

[jay test]$ tree .
.
`-- compressme
    |-- a
    |   `-- largefile (10MB)
    `-- b
        `-- largefile (10MB, identical to ../a/largefile)

3 directories, 2 files
[jay test]$ du -sh compressme/
21M compressme/
[jay test]$ tar -cf compressme.tar compressme/
[jay test]$ du -sh compressme.tar 
21M compressme.tar
[jay test]$ lzma -9 compressme.tar
[jay test]$ du -sh compressme.tar.lzma 
11M compressme.tar.lzma
Jay
  • 6,439
  • 24
  • 34