What makes a tar archive seekable?

1

1

It seems a tar archive being seekable can make a large difference when listing or extracting just a few files. Unfortunately the man page is really scarce on information. It seems that compressed archives are not seekable [1], but that post provides no evidence. Is there a more reliable source of information to read up on this issue?

[1] https://serverfault.com/questions/59795/is-there-a-smarter-tar-or-cpio-out-there-for-efficiently-retrieving-a-file-store

Peltier

Posted 2017-07-28T10:19:09.683

Reputation: 4 834

1

What is wrong with this answer?

– DavidPostill – 2017-07-28T10:26:38.280

@DavidPostill: There's nothing wrong with that answer, it's just an answer to a different question. – Peltier – 2017-07-28T12:31:01.257

Really? So "GNU tar creates "seekable" archives by default." and "Compressed archives are not "seekable" because current (1.26) GNU tar offloads compression to external program" doesn't answer your question? – DavidPostill – 2017-07-28T12:33:12.647

That was not the original question, and it provides no supporting evidence. I agree it's a good start, though. – Peltier – 2017-07-28T12:57:24.773

The supporting evidence is the source code. – DavidPostill – 2017-07-28T12:58:39.963

I was hoping to get a better explanation than "read the source code". But go ahead and close my question if that is what you want to do. – Peltier – 2017-07-28T12:59:54.803

I'm not going to close it. Someone else may have an acceptable answer for you – DavidPostill – 2017-07-28T13:00:55.733

Answers

1

The file header for each file includes its size in the archive. This allows the file content to be skipped if not needed. Tar just seeks to the next header that follows the file content. There is documentation on the header format.

Compressed tar files are just that. You can freely switch been an uncompressed and compressed tar file format by using the appropriate uncompression program (often gunzip) or compression program (gzip). With some tar programs this is the only option. The tar file itself remains seekable even if it is compressed.

What is not seekable is the compressed format. Compression works by finding a relatively small number of bytes to represent the data being compressed. Blocks of data with relatively few byte values or repeated byte strings compress well. Block of data with lots of different byte values and few repeated byte stings do not compress well if at all. For some data, compression can actually increase the size of the file. The compression ratio for blocks within the file varies. The variance can be extreme for a tar file which may consist of very compressible files, and relatively non-compressible files.

There is no mechanism within the compressed data to seek to some position in the uncompressed data. While some compression programs allow seeking to an individual file with a compressed archive, the only file the compressed archive would have access to is the tar file. Tar files are rarely compressed with such tools, although compressed or uncompressed tar files may be included when archiving sets of files.

BillThor

Posted 2017-07-28T10:19:09.683

Reputation: 9 384