What are the best options to use when compressing files using 7 Zip?

69

42

I often have to gather log files and upload them to a central server (Owned by another company). The central server has a size limit of the file, so I am trying to create the smallest file possible that is still in the zip format.

What are the best setting to use when compressing a text file to a zip format when my only need is a small file size?

7zip Options

I've done the obvious and chosen ultra compression, and I have noticed that LZMA does a better job than deflate, but there are far too many other permutations of options for me to test them all.

jjnguy

Posted 2011-05-10T14:19:13.153

Reputation: 1 349

1define "normal zip tools". Most "normal zip tools" nowadays like 7z and winrar can extract 7z files. – phuclv – 2016-05-29T11:04:28.873

1Is splitting the the zip in to multiple files an option? – JaredMcAteer – 2011-05-10T14:21:29.297

@Original, I don't think so. (Is that what the 'split to volumes' option is for?) I'd rather keep it simple and have just 1 file. If I really need, I can split the original file (which I have done in the past), but my goal is to keep it in one file. – jjnguy – 2011-05-10T14:30:45.587

Oh, and I saw this question http://superuser.com/questions/178111/what-settings-to-use-when-making-7zip-files-in-order-to-get-maximum-compression-w But it really doesn't answer my question at all.

– jjnguy – 2011-05-10T14:32:11.233

I think the exact question you asked isn't answerable. Some text files compress better with different algorithms. Sometimes zip is better, sometimes gzip; sometimes, compression level makes a difference, and sometimes not. It all depends on the file. Therefore, instead of answering the precise question, I've addressed the motivating example, which deals with maximum allowed sizes. Even if you have the best possible algorithm, you're still limited by size, and a particularly large log might not be able to be compressed below that threshold, so you'll need splitting anyway. – Rob Kennedy – 2011-05-10T14:42:19.707

@Rob, ok. Makes sense. I know that the input data is very important in determining the size of a resulting zip file. I wasn't sure if there was a canonical set of settings that usually work best. – jjnguy – 2011-05-10T14:44:22.090

4As soon as you pick anything but the Deflate format, it's not a "normal" .zip file anymore, but an "extended" zip file, pioneered by WinZip. They originally kept the extension as .zip, to much consternation (since most normal zip-handling tools can't deal with them), but most archivers use .zipx now to distinguish them from traditional .zip files. If you can use LZMA, switch to .7z and pick PPMd -- it should compress better (and faster!) for text files. – afrazier – 2011-05-20T16:04:40.747

@afra, hmmmm. Thanks for the info. I need to keep it in a format that most normal zip tools can unzip. Otherwise I'd be using the 7z format already. – jjnguy – 2011-05-20T16:52:59.613

@Justin: That sucks. Can you use a self-extracting archive? – afrazier – 2011-05-20T18:55:25.887

@afrazier, I'm sending these files to a 3rd party vendor, and they expect to get 'regular' zip files. (Or files they can unzip using the 'standard' method.) – jjnguy – 2011-05-20T19:48:03.437

1

@afrazier: "The .ZIP File Format Specification documents the following compression methods: stored (no compression), Shrunk, Reduced (methods 1-4), Imploded, Tokenizing, Deflated, Deflate64, bzip2, LZMA (EFS), WavPack, PPMd." https://en.wikipedia.org/wiki/Zip_%28file_format%29#Compression_methods

– endolith – 2013-12-13T22:26:29.843

2@endolith: bzip2, lzma, wv, and ppmd are all very recent additions to the file format. It's not even safe to assume that your recipient can handle deflate64, much less anything newer. – afrazier – 2013-12-13T22:33:39.190

Answers

65

To create the smallest standard ZIP file that 7-Zip can create, try:

7z a -mm=Deflate -mfb=258 -mpass=15 -r foo.zip C:\Path\To\Files\*

Source: How can I achieve the best, standard ZIP compression?

Otherwise if you don't care about the ZIP standard, use the following ultra settings:

7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on archive.7z dir1

Which are:

-t7z   7z archive

-m0=lzma
       lzma method

-mx=9  level of compression = 9 (Ultra)

-mfb=64
       number of fast bytes for LZMA = 64
-md=32m
       dictionary size = 32 megabytes

-ms=on solid archive = on

kenorb

Posted 2011-05-10T14:19:13.153

Reputation: 16 795

2@Tek: Why? It's not a good one. The question was about using the "standard ZIP format", so the answer shouldn't be specifying LZMA. -ms=on is for .7z, not standard zip files. -md is related to BZip2, so I don't expect it to affect ZIP (or even LZMA). -mfb=64 is an unoptimized value: -mfb=258 makes smaller zip files. And this answer doesn't even mention -mpass=15 which can affect zip files. This is a nicely formatted answer which is, unfortunately, wrong in multiple ways. – TOOGAM – 2015-11-08T12:32:47.207

7I would use lzma2 – Lance Badger – 2016-07-07T15:22:21.007

If you look at the 7-zip FAQ, it states that newer versions of 7z may have worse performance than older versions in some circumstances. Read the FAQ for more detail, but in short use the 'qs' in Parameters field in the GUI or use -mqs in the command line version to use the old sort by file extension method. https://www.7-zip.org/faq.html.

– drojf – 2019-05-15T14:21:01.680

14

If you can use .7z format rather than just .zip, I would simply use PPMD with the following options and leave everything else as set by the Compression Level:

  • Archive Format: 7z
  • Compression Method: PPMD
  • Compression Level: Ultra

I regularly compress server/text logs (60MB+) using these options and they usually come out at 1-2% of the original size.

Umber Ferrule

Posted 2011-05-10T14:19:13.153

Reputation: 3 149

maybe works better for text files but for me didn't work better compressing a C# project (text+DLLs) – Riga – 2015-03-25T09:36:17.870

4Why is PPMD superior compression method for text files? – user598527 – 2017-02-27T19:58:43.713

3LZMA2 gives better results for text files than PPMD. – T3rm1 – 2018-11-22T10:38:36.280

For text such as log files, ppmd is definitely the way to go. However, the question mentioned that it needed to stay in the zip format, which may not work with PPMD. – Brian Minton – 2013-12-19T16:31:30.863

Just tried zip with PPMD and Windows Explorer opens the contents up without complaint here on Windows 7 – Umber Ferrule – 2013-12-20T16:44:09.583

3I noticed that too. It opens the contents up just fine. However, when I actually tried to view one of the files inside the zip file, it failed. – Brian Minton – 2013-12-23T16:58:58.033

8

After much experimentation, digging into the detailed 7zip documentation, and reading some of the 7z source code regarding the advanced LZMA2 parameters, here is a better method below. It reduced some 1GB real-world test files more than 2 to 4 times better than the previously accepted solutions posted here or even in the 7z manpage.

7z a -t7z -mx=9 -mfb=273 -ms -md=31 -myx=9 -mtm=- -mmt -mmtf -md=1536m -mmf=bt3 -mmc=10000 -mpb=0 -mlc=0 archive.7z inputfileordir

The LZMA2 compression is assumed here, but you might be able to get even better performance in 7zip with passing advanced LZMA2 options like -m0=LZMA2:27, or -m0=LZMA2:d25, or an array of parameters like

-m0=BCJ2 -m1=LZMA:d25 -m2=LZMA:d19 -m3=LZMA:d19 -mb0:1

Such parameters didn't seem to be respected by the 7z versions I tested, but you may want to explore further or patch the 7z code to properly parse them. Or maybe it is supposed to work and is just broken in the builds that were tested.

91735472

Posted 2011-05-10T14:19:13.153

Reputation: 81

wow, this made a really big difference. For my archive, I experimented with a lot of other suggestions, including other answers here, and the best result I got was 99MB, vs 85MB using these settings. – user9399 – 2019-08-25T23:42:39.573

How would you call this on Windows 10 in command line? I get "The parameter is incorrect" on version 19.00 2019-02-21 – user1306322 – 2019-12-04T09:11:06.203

7

I compare for db.fdb 1,2 GB (1236598784 B) in Ubuntu server 14.04.03 with p7zip [64] 9.20 on VM:

1. 7z a -mx=9 1.7z db.fdb
2. 7z a -t7z -m0=lzma -mx=9 -mfb=64 -md=32m -ms=on 2.7z db.fdb
3. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on 3.7z db.fdb
4. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -pass=15 4.7z db.fdb
5. 7z a -mx=9 -mmt=on 5.7z db.fdb
6. 7z a -t7z -m0=lzma -mx=9 -mfb=258 -md=32m -ms=on -mmt=on 6.7z db.fdb

and have that results:

1.7z 96 MB (100108731 B) with 6' 25"
2.7z 95 MB ( 99520375 B) with 5' 18"
3.7z 93 MB ( 97512311 B) with 9' 19"
4.7z 93 MB ( 97512345 B) with 9' 40"
5.7z 96 MB (100108731 B) with 5' 26"
6.7z 93 MB ( 97512311 B) with 9' 09"

I think second method works fine = (almost) best compress with best time. But for best "view" and easy to remember is first method - with small files and no point of max compress. Between 2 and 3 method we don't get extra smaller 7z but pay almost twoo more time for compression. Anyone decide with his own.

SULIMa

Posted 2011-05-10T14:19:13.153

Reputation: 71

3

I have decided to do some experiments for empirically finding the optimal compression parameters.

The tool I have used was 7-ZIP finetuner. This tool hunts for the optimal parameters by simply repeating the compression with varying parameters looking for the optimal combination. A run for one file may sometimes take more than an hour even on a fast computer.

The parameters that it tries are:

LC : number of Literal Context bits
LP : number of Literal Pos bits
PB : number of Pos Bits
YX : level of file analysis
FB : number of Fast Bytes

I have left the default parameters of dictionary size as 512 MB and solid block size On. The tool uses the LZMA method.

The best combinations of parameters on several types of files were as follows:

enter image description here

I note that the best values were not constant even for files of the same type.

Conclusion: There are no best options, as each file may have its own unique best combination. One may drive all parameters up to their limits, but an improvement is not at all guaranteed.

The most common combination seems to be:

LC : 8
LP : 0
PB : 1
YX : 5
FB : 273

Some 7-Zip references:

harrymc

Posted 2011-05-10T14:19:13.153

Reputation: 306 093

0

Set the "split to volume, bytes" field to the server's maximum allowed file size (in bytes, I think, although it looks like it accepts common abbreviations like "KB" and "MB"). If the zip file exceeds that size, 7-zip will split it into multiple files automatically, such as integration_serviceLog.zip.001, integration_serviceLog.zip.002, etc. (Way back when, PK Zip used this to span zip files across multiple floppy disks.) You'll need all the files to be present to unzip them. Use that instead of worrying about the absolute best compression settings to use for any particular set of files, because what's best for one file may be different for another file, and you don't want to have to go through this every time you need to copy logs.

Rob Kennedy

Posted 2011-05-10T14:19:13.153

Reputation: 185

1I'm worried about how the people on the other side will uncompress the files. I need it to be as simple as possible for them. Do you know if you can unzip the split volumes using the built-in windows zip, or gzip? – jjnguy – 2011-05-10T14:40:28.773

Apparently, no, the built-in Windows zip-folder feature doesn't do spanned zip files. That's too bad, since it's been a standard feature of the format since before Windows 3. I'd be very surprised if gzip couldn't do it, though. WinZip definitely can. – Rob Kennedy – 2011-05-10T14:47:36.720