Multithreaded support in 7za

19

4

(I posted this first on serverfault, but then I realized it probably belongs here.)

I'm trying compress a very large text file using 7za (p7zip) 9.20. The -mmt option doesn't seem to have any effect. I've tried both -mmt=on and -mmt=2. This is an 8-core machine. One person suggested adding -m0=lzma2 as an argument, but that just gives me E_INVALIDARG. Does anybody know how to make this work?

This has no effect:

7za a -mx=9 -mmt=2 -p myarchive.zip bigfile.txt

And this fails with an error:

7za a -m0=lzma2 -mx=9 -mmt=2 -p myarchive.zip bigfile.txt


7-Zip (A) [64] 9.20  Copyright (c) 1999-2010 Igor Pavlov  2010-11-18
p7zip Version 9.20 (locale=en_US.UTF-8,Utf16=on,HugeFiles=on,8 CPUs)
Scanning

Creating archive release_20120601-1-http.zip



System error:
E_INVALIDARG

Brian L

Posted 2012-06-07T17:13:12.290

Reputation: 293

I believe the option is simply -mmt, not -mmt=2. Also I believe the proper syntax is -mx9, although both might work. – Breakthrough – 2012-06-07T17:20:20.837

Thanks, but -mmt without an option still uses only one thread. According to http://docs.bugaco.com/7zip/MANUAL/switches/method.htm#ZipMultiThread, you can specify the number of threads to use with, -mmt=N.

– Brian L – 2012-06-07T17:30:02.890

I would still recommend using LZMA/Deflate even though it's only single threaded. While you might get an increased compression speed with BZip2, it's less efficient when compressing plain text, and the single-threaded variants are slower than the LZMA/Deflate equivalents.

– Breakthrough – 2012-06-07T18:09:11.473

@Breakthrough: BZip2 usually achieves better compression than DEFLATE, as shown in your link. It's also much, much faster than LZMA (when compressing). – Dennis – 2012-06-07T18:17:43.530

Answers

25

According to -m (Set compression Method) switch # ZipMultiThread - 7ZIP manual & documentation, mt defaults to on, so there's no need to specify it at all.

However, 7zip's implementation of the DEFLATE algorithm doesn't support multi-threading!

As you have already discovered,

7za a archive.zip bigfile

only uses one core.

But .zip files compress every file individually. When compressing several files, the multi-threading option compresses one file per core at once.

Try it and you'll see that

7za a archive.zip bigfile1 ... bigfileN

will use all available N cores.

If you want to speed up the compression of a single file, you have two choices:

  1. Split up bigfile in chunks.

  2. Use a different compression algorithm.

    For example, 7zip's implementation of the BZip2 algorithm supports multi-threading.

    The syntax is:

    7za a -mm=BZip2 archive.zip bigfile
    

Also, the syntax error is caused by your attempt to use the LZM Algorithm for a .zip container. That's not possible.

The possible algorithms for .zip conatiners are DEFLATE(64), BZip2 and no compression.

If you want to use the LZM Algorithm, use a .7z container. This container also handles the following algorithms: PPMd, BZip2, DEFLATE, BCJ, BCJ2 and no compression.

Dennis

Posted 2012-06-07T17:13:12.290

Reputation: 42 934

@Dennis I thought the OP was using LZMA(2), which from the documentation, "LZMA compression uses only 2 threads." Although I agree, intuitively (due to the way Lempel-Ziv encoding works), it would be very difficult to multithread LZMA or Deflate (which is just LZMA with Huffman encoding).

– Breakthrough – 2012-06-07T18:03:59.267

1@Breakthrough: At first, so did I. (Check out the revisions of my answer.) That's what the syntax error was about. You can't use LZMA compression with a .zip container. – Dennis – 2012-06-07T18:05:39.197

@Dennis ah, thank you for clearing that up. Didn't see that the OP was using a .ZIP container. – Breakthrough – 2012-06-07T18:06:23.823

Wait, so I'll get a different result if I just change the file extension of the container to .7z? – Brian L – 2012-06-07T18:08:01.287

Apparently, yes. I didn't realize that the extension of the destination archive affected the program's behavior. Changing it to .7z solved both problems above. Thanks again for your help. – Brian L – 2012-06-07T18:10:39.317

3@BrianL there's a "thanks" button built in. It looks like an arrow facing upwards ;) – nhinkle – 2012-06-07T18:36:00.787

5

This is an old question, and not the answer to the specific question, but an answer to the spirit of the question (Using all cores to compress a zip format)

pigz (parallel gzip with .zip option)

pigz -K -k archive.zip bigfile txt

This will give you a zip compatible file 7x faster for same compression level.

A quick comparisons of zip compatible and non-zip compressors using single and multiple cores.

wall times on i7-2600k to compress 1.0gb txt file on fedora 20

67s (120mb) 7za (zip,1 thread)
15s (141mb) 7za -mx=4 (zip,1 thread)
17s (132mb) zip (zip,1 thread)
 5s (131mb) pigz -K -k (zip,8 threads)
 9s (106mb) bsc (libbsc.com) (not zip,8 threads)
 5s (130mb) zhuff -c2 (not zip,8 threads)
 2s (149mb) zhuff (not zip,8 threads)

wall times to decompress

4.2s unzip -t
2.0s pigz -t
5.1s bsc d
0.5s zhuff -d

tgeorge

Posted 2012-06-07T17:13:12.290

Reputation: 51

gzip is much, much faster than bzip2, so the extra compression isn't always worth it. – jesjimher – 2017-03-13T12:27:51.737

why pigz when you can pbzip2 or pixz? – nod – 2014-04-30T21:31:26.947

0

Just use -mmt[N+1]

For example: -mmt2 is for one thread, -mmt9 is for eight threads

acubed

Posted 2012-06-07T17:13:12.290

Reputation: 1

-1

Verified and tested: To use multithreading on 7za the parameter must be "-mmt#" not "-mmt=#", putting the equal sign makes it to ignore.

How i had discovered? After i run 7z without any parameter it shows the info about parameters, on switches it say "-mmt[N]", not "-mmt=[N]"

So if i understand well, the parameter you are typing "-mmt=2" may be misswritten and may be "-mmt2", without the equal sign.

Not sure if i understand well, my english is really poor.

By the way, why you use "7za" instead of just "7z"?

So to test the parameter i run a set of commands to do benchkmarks and that confirmed the typo error on some documentation. The correct parameter must be typed without the equal sign.

Command to do a benchmark with 7z with only one thread: 7z b -mmt1

Command to do a benchmark with 7z with only two threads: 7z b -mmt2

Command to do a benchmark with 7za with only two threads: 7za b -mmt2

Command to do a benchmark with 7za with only one thread: 7za b -mmt1

There is no equal sign on the parameter ˋ-mmt#ˋ, neither for 7z, nor 7za.

Laura

Posted 2012-06-07T17:13:12.290

Reputation: 1