What settings to use when making 7zip files in order to get maximum compression when compressing PDFs?

15

What settings to use when making 7zip files in order to get maximum compression? I'm compressing PDF documents containing scanned images. I'm thinking about using LZMA2, but I don't know what to set in dictionary size, word size, etc. Also, would LZMA or PPMd be better options?

I need to have some files transfered (~200MiB) over net and upload speeds here are very slow, so I'd want to compress the data as much as possible. CPU time consumed is not very important.

EDIT

Here's what I got after testing various compression methods:

Uncompressed size was: 25,462,686B

My processor is Intel Core 2 Due T8100 and I have 4GiB of ram.

Best compression was with PeaZip using PAQ8O algorithm. Resulting file size was 19,994,325B. Settings used were compression level: maximum. Unfortunately, speed of compression was around 5KiB/s, so it took more that one hour to compress data.

Next was experimental PAQ9O compressor. Using it, I got 20,132,660B in about 3 minutes of compression. Unfortunately, program is command line only, and not many other programs use that compression algorithm. It also uses around 1.5GiB of RAM with settings I used (a -9 -c)

After that was 7-Zip 9.15 beta (2010-06-20) using LZMA2. Using it, I got 20,518,802B in about 3 minutes. Settings used were word size 273, dictionary size 64MB and I used 2 threads for compression.

Now back to my original question: In my case solid block size didn't produce any noticeable results. Increasing word size did produce some results. Difference between highest word size and smallest was 115,260B. I believe that such savings do justify efforts needed to make two necessary clicks and change word size.

I tried using other compression algorithms supported by 7zip and PeaZip and they produces files in sizes from 19.8MiB to 21.5MiB.

In the end my conclusion is that when compressing PDF documents containing mostly images, the effort needed to use exotic compression algorithms isn't justified. Compression using LZMA2 in 7zip produced quite acceptable results in least amount of time.

AndrejaKo

Posted 2010-08-19T23:48:08.827

Reputation: 16 459

What's different about using PeaZip? It's just a GUI wrapper around 7zip and many other tools – Cole Johnson – 2013-08-21T22:42:58.107

@Cole "Cole9" Johnson Well the difference is that in my case I used some "other" tools from PeaZip that did not have a GUI at the time. If I remember correctly back then only PeaZip offered PAQ8O algorithm with a GUI. – AndrejaKo – 2013-08-22T07:32:10.190

Answers

7

The content of the PDFs (text & images) is probably already compressed -- so there's not going to be much to gain by trying to compress them again.

afrazier

Posted 2010-08-19T23:48:08.827

Reputation: 21 316

3Well, no. I did a little bit of testing and took 24MiB of PDFs and compressed them using default settings. Result was a 19 MiB file. In my case, those 5 MiB do matter in my case. – AndrejaKo – 2010-08-20T09:02:29.440

1Looks like you're right. I couldn't produce results significantly better than 7zip defaults no matter what I did. I'm still convinced that some compression is better than none. – AndrejaKo – 2010-08-20T10:48:38.923

3If you could save that much space, then there's probably work that could be done with the PDFs themselves to save almost all of that space without 7-Zip. A trip through Acrobat's PDF Optimizer can work wonders. – afrazier – 2010-08-21T19:17:01.497

See usr's answer - the compression used in PDF (zlib) can be reverted to compress them further (and applied again on reconstruction). This often results in ~50% size reduction – schnaader – 2019-11-20T10:05:18.460

@schnaader: That's really interesting. I've seen and used tools like Acrobat's PDF Optimizer and MuPDF to modify the PDFs while keeping them viewable, but being able to losslessly transform them like that is also very valuable and can be used to great advantage. – afrazier – 2019-11-20T15:37:28.730

8

Try precomp - it first decompresses the already compressed data inside of your PDFs. Then 7z can do its magic on uncompressed data.

Also try nanozip which I have verified to be very effective, yet very efficient (400kb/s at compression ratios of PAQ algorithms).

usr

Posted 2010-08-19T23:48:08.827

Reputation: 2 040

2

7za a -t7z -mx-9 -mfb=258 -mpass=15 filename.7z subdir

Adjust the first word as necessary for the name of your command line executable, and adjust the parts after "-mpass=15" to customize your filename and what it should include.

This answer is not specific to PDF documents.
This uses LZMA, not PPM. I've stayed away from PPM because there are too many variations that are not compatible with other variations. LZMA looks to be more stable, with compatibility being more widely supported. So I've stayed away from PPM precisely because my opinion was, as you've stated, "the effort needed to use exotic compression algorithms isn't justified."

TOOGAM

Posted 2010-08-19T23:48:08.827

Reputation: 12 651

2LZMA2 is significantly better than LZMA but is for (effective) use only on 64 bit systems. – O.M.Y. – 2016-06-21T19:09:27.353

-3

lzma compression is the best because you can make an sfx file or an msi package with a high compression ratio. in your case you are not compressing a big file so the diffrence is quite small especially if the file has been compressed such as: mp3 or png

try win arc it is free and gives a gret compression ratio

nader

Posted 2010-08-19T23:48:08.827

Reputation: 1