49

Can I make unzip or any similar programs work on the standard output? The situation is I'm downloading a zip file, which is supposed to be unzipped on the fly.

Related issue: How do I pipe a downloaded file to standard output in bash?

Alex
  • 2,287
  • 5
  • 32
  • 41
  • This seemed like it should be doable, but it looks like it's only possible to extract a zip and pipe the file to another command if the zip contains only a single file. I wanted to extract a specific file from a multi-file zip. Instead of piping, I switched to chaining multiple commands 'unzip file.zip /path/file && dostuff /path/file && rm -rf /path' While not answering the original question, and resulting in temporary files being created, it satisfied my need. – Stan Kurdziel Jul 18 '13 at 21:13
  • Check out pigz. We use it in a pipe. http://andrew.tumblr.com/post/2316602611 – dmourati Nov 14 '13 at 23:18

12 Answers12

24

While a zip file is in fact a container format, there's no reason why it can't be read from a pipe (stdin) if the file can fit into memory easily enough. Here's a Python script that takes a zip file as standard input and extracts the contents to the current directory or to a specified directory if specified.

import zipfile
import sys
import StringIO
data = StringIO.StringIO(sys.stdin.read())
z = zipfile.ZipFile(data)
dest = sys.argv[1] if len(sys.argv) == 2 else '.'
z.extractall(dest)

This script can be minified to one line and created as an alias.

alias unzip-stdin="python -c \"import zipfile,sys,StringIO;zipfile.ZipFile(StringIO.StringIO(sys.stdin.read())).extractall(sys.argv[1] if len(sys.argv) == 2 else '.')\""

Now unzip the output of wget easily.

wget http://your.domain.com/your/file.zip -O - | unzip-stdin target_dir
Jason R. Coombs
  • 1,000
  • 1
  • 10
  • 18
  • 1
    You and python rock!!! – Farid Nouri Neshat May 27 '12 at 19:54
  • 5
    Nice one-liner, and +1 for mentioning that the file has to fit into memory. (There is unfortunately no way to unzip a pkzip file due to the file format structure). – lxgr Jun 09 '12 at 21:28
  • 3
    keep in mind this buffers everything in memory before extracting – William Casarin Jun 21 '14 at 05:19
  • 1
    *there's no reason why it can't be read as a stream if the file can fit into memory easily enough* isn't really accurate. The reason why you're forced to buffer the whole zip archive in memory before you extract the contents is specifically because it can't be read as a stream. Of course, it can still be useful to avoid writing the zip archive to a file. – Håkan Lindqvist Mar 05 '16 at 16:23
  • 3
    This is **not** a stream, you are reading the whole file in memory by using the `.read()` method – Romuald Brunet Mar 13 '18 at 10:39
  • I can see how the word stream is misleading. I think 'input' or 'stdin' may be more accurate. – Jason R. Coombs Mar 14 '18 at 13:19
18

This is unlikely to work how you expect. Zip is not just a compression format, but also a container format. It rolls up the jobs of both tar and gzip.bzip2 into one. Having said that, if your zip has a single file, you can use unzip -p to extract the files to stdout. If you have more than one file, there's no way for you to tell where they start and stop.

As for reading from stdin, the unzip man page has this sentence:

Archives read from standard input are not yet supported, except with funzip (and then only the first member of the archive can be extracted).

You might have some luck with funzip.

David Pashley
  • 23,151
  • 2
  • 41
  • 71
  • If zip has multiple files inside, then -p can print out single file using file name as a parameter: unzip -p temp.zip file-inside-zip – Taavi Ilves Dec 23 '15 at 11:51
14

I like to use curl because it is installed by default (the -L is needed for redirects which often occur):

curl -L http://example.com/file.zip | bsdtar -xvf - -C /path/to/directory/

However, bsdtar is not installed by default, and I could not get funzip to work.

Todd Partridge
  • 241
  • 2
  • 3
  • Also works fine with multiple files – Jon Nordby Jul 15 '18 at 01:01
  • I wanted to list the files, and the other answers were very unhelpful mostly dismissing the problem. – Pysis Feb 13 '20 at 21:47
  • bsdtar is magical and supports features like `--strip-components` or `-s` (substitute). Users coming from `tar` will find this tool like magic for zip files. – Jason R. Coombs Jan 10 '21 at 17:05
  • bsdtar that is available on Windows has a bug that will make some files in zip (in case of multiple files only and tar.gz is safe with multiple files) occasionally disappear in the extraction when done in pipe like this. Use `curl` to the disk and the unzip the `.zip` with bsdtar from there – eri0o May 28 '21 at 02:55
  • On Debian and derivates (like Ubuntu) `bsdtar` is a part of hte `libarchive-tools` package. – pabouk - Ukraine stay strong Sep 19 '22 at 17:46
9

This is a repost of my answer to a similar question:

The ZIP file format includes a directory (index) at the end of the archive. This directory says where, within the archive each file is located and thus allows for quick, random access, without reading the entire archive.

This would appear to pose a problem when attempting to read a ZIP archive through a pipe, in that the index is not accessed until the very end and so individual members cannot be correctly extracted until after the file has been entirely read and is no longer available. As such it appears unsurprising that most ZIP decompressors simply fail when the archive is supplied through a pipe.

The directory at the end of the archive is not the only location where file meta information is stored in the archive. In addition, individual entries also include this information in a local file header, for redundancy purposes.

Although not every ZIP decompressor will use local file headers when the index is unavailable, the tar and cpio front ends to libarchive (a.k.a. bsdtar and bsdcpio) can and will do so when reading through a pipe, meaning that the following is possible:

wget -qO- http://example.org/file.zip | bsdtar -xvf-
ruario
  • 191
  • 1
  • 1
8

What you want to do is, make unzip take a ZIPped file on its standard input rather than as an argument. This is usually easily supported by gzip and tar kind of tools with a - argument. But the standard unzip does not do that (though, it does support extraction to a pipe). However, all is not lost...

Look at funzip manual page.

funzip without a file argument acts as a filter; that is, it assumes that a ZIP archive (or a gzip'd file) is being piped into standard input, and it extracts the first member from the archive to stdout. When stdin comes from a tty device, funzip assumes that this cannot be a stream of (binary) compressed data and shows a short help text, instead. If there is a file argument, then input is read from the specified file instead of from stdin.

Given the limitation on single-member extraction, funzip is most useful in conjunction with a secondary archiver program such as tar(1). The following section includes an example illustrating this usage in the case of disk backups to tape.

This goes well with the idea that most linux archives are usually TAR'ed and then ZIPped in some way (gzip, bzip, et al). This will work for you if you have a tar.ZIP.


It is worth noting that funzip is written by Info-ZIP original author Mark Adler. He writes in the funzip man page,

this functionality should be incorporated into unzip itself (future release).

however, no such update is seen around. I suspect that Mark found it unnecessary since other archiving methods worked easily with TAR.

nik
  • 7,040
  • 2
  • 24
  • 30
  • Just a comment; some people would like python or any language as an option to unzip. A prime example is Heroku which does not include tar or unzip on its system. A work around is to use jar by installing Java which is allowed. – Nick Nov 11 '14 at 20:37
  • There's more about dealing with limitations of funzip and similar tools (in particular only being capable of showing the first member of an archive) in this answer: http://unix.stackexchange.com/a/211286/77539 – Joshua Goldberg Mar 11 '17 at 21:54
  • FYI funzip can extract only the first member file of a ZIP archive. – pts Jan 28 '21 at 13:41
8

Repost of my answer:

BusyBox's unzip can take stdin and extract all the files.

wget -qO- http://downloads.wordpress.org/plugin/akismet.2.5.3.zip | busybox unzip -

The dash after unzip is to use stdin as input.

You can even,

cat file.zip | busybox unzip -

But that's just redundant of unzip file.zip.

If your distro uses BusyBox by default (e.g. Alpine), just run unzip -.

Saftever
  • 261
  • 3
  • 3
5

The simplest common utility available that will do this is jar, which will presume STDIN is being used if you pass it no file args. It also takes arguments similar to the tar program for operations.

e.g. list the content of an archive

curl https://my.example.com/file.zip | jar t

While Java is not always installed, on those machines where it is, jar is definitely the most convenient method of doing this.

Adrian
  • 151
  • 1
  • 4
  • 1
    To download & extract: `curl https://my.example.com/file.zip | jar xv` – Noam Manos Jan 15 '20 at 13:47
  • `jar` in JDK (and OpenJDK) does work, but on some Linux systems the `jar` command is installed from the *fastjar* package, and that doesn't support many ZIP features including Zip64. – pts Jan 28 '21 at 13:50
4

It's not possible with Info-Zip which is the most common OSS implementation. More importantly though, it's not recommended due to the constructs of ZIP archives.

If a change of format is viable to you then consider using tar(1) instead. It is quite happy with streamed input/output and, in fact, expects it by default.

Additionally you can often tell whether applications expect streamed input/output by specifying "-" for a filename. Info-Zip, as you can imagine, doesn't treat this as a valid argument.

Dan Carley
  • 25,189
  • 5
  • 52
  • 70
4

In zsh, you can do the following:

unzip =( curl http://example.com/someZipFile.zip )
Ian Robertson
  • 149
  • 1
  • 2
  • 1
    Please note that this command downloads the entire .zip archive before it starts extracting the first member file. Thus it doesn't do streaming extraction. – pts Jan 28 '21 at 13:52
1

I wrote a Python (2.x) script to do streaming extraction of ZIP archives (which uses a constant amount of memory no matter how large the ZIP file is), you can get it from here: https://raw.githubusercontent.com/pts/unzip_scan/master/unzip_scan.py . Usage: cat file.zip | sh unzip_scan.py -.

The scan_zip function implements a streaming parser (and decompressor) for the ZIP (and Zip64) file format, including a few extensions (so that it supports member files larger than 4 GiB, and it also extracts the last-modification time). It uses zlib.decompressobj (part of the Python standard library, heavy lifting implemented in C) for actual Flate decompression.

pts
  • 425
  • 1
  • 5
  • 15
  • Rather than just linking to a file, could you explain and include at least the core snippets of code? – SEoF Feb 01 '21 at 09:46
  • @SEoF: Unfortunately it's not possible to reuse a small part of that file without the rest because of the internal dependencies, so the core snippet of code is the entire file. I've added some description of the code to my answer, to make it easier for readers to decide whether it is a good fit for their use case. – pts Feb 01 '21 at 14:15
1

I actually needed something a little more complex - extract a specific file if it exists. The difficulty being, the input file stream may not be a zip file, and in which case, I needed it to continue through the pipe. Here is my solution (thanks mostly to Jason R. Coombs solution)

python -c "import zipfile,sys,StringIO
data=sys.stdin.read()
try:
    z=zipfile.ZipFile(StringIO.StringIO(data))
    z.open(\"$1\")
    sys.stdout.write(z.read(\"$1\"))
except (RuntimeError, zipfile.BadZipfile):
    sys.stdout.write(data)"

I saved this as a file named "effpoptp" (not a simple name) in the "/bin" folder on my machine so testing it is like so:

cat defaultModel.mwb|effpoptp "document.mwb.xml"

The purpose is to version control MySQL Workbench files, where the file could be the xml file named as the workbench file, or the complete workbench file.

SEoF
  • 119
  • 4
  • 1
    Please note that this command reads the entire .zip archive to memory before it starts extracting the first member file. Thus it doesn't do streaming extraction. – pts Jan 28 '21 at 13:53
0

More recently, I had a similar use-case where I wished to selectively extract content from a large zip file in the cloud, and I found another technique that boils down to:

  • Mount the remote zip file into the file system (technique might vary, depending on the characteristics of the remote file). Important - this mounting technique must allow for random access of the remote file (seeking).
  • Rely on standard zip tools (like unzip) to parse the file and perform operations (including extraction of the files as they arrive over the pipe).

This approach still requires making changes to the local file system (creating a mount) but could be used to unzip files as they stream over the network.

In theory, it should be possible to implement something similar using HTTP Range requests to perform incremental or selective zip operations on an HTTP-hosted zipfile.

Jason R. Coombs
  • 1,000
  • 1
  • 10
  • 18