Linux - Is there a way to convert .bz2 files to .tar.bz2 files using pipes?

5

2

Is there a way to convert a .bz2 file to a .tar.bz2 file without decompressing the entire thing to disk and then re-compressing? The decompressed size is larger than my drive. Since bz2 operates on blocks, it would seem like you could just decompress a block, pipe it, re-compress it, remove the decompressed block from memory, etc.

I asked this on Ubuntu Forums and didn't find an answer.

endolith

Posted 2009-10-11T18:30:13.177

Reputation: 6 626

4Why would you want to do this? – matpie – 2009-10-30T09:28:17.223

Originally it was to work with very large data files without uncompressing them. archivemount lets you mount .tar.bz2 (because it has "filesystem" inside), but not .bz2. – endolith – 2009-10-30T23:47:55.007

Answers

3

Update: My original answer doesn't work at all, sorry. tar won't accept a data stream from STDIN as input, so the first command fails.

The only way I can think of to accomplish what you want is to write your own program to add the required tar headers and such around your data stream. Then you could write:

$ bzcat foo.bz2 | stream-to-tar | bzip - > foo.tar.bz2

... and (assuming your program gets the tar format right) you could decompress it with a standard tar xf foo.tar.bz2.


This probably isn't how you want to do it, since it doesn't provide any of the usual advantages of tar'ing the file in the first place.

$ bzcat foo.bz2 | tar cjf foo.tar.bz2 -

Now, the problem is that tar doesn't include any filesystem in it cause all we've given it is a decompressed data stream. That means you need to decompress/untar it like this:

$ tar --to-stdout -xjf foo.tar.bz2 > foo

quack quixote

Posted 2009-10-11T18:30:13.177

Reputation: 37 382

what version of tar is this? doesn't work with GNU tar 1.16.1. – goldPseudo – 2009-10-30T00:49:09.220

GNU tar 1.20 on debian, 1.21 on cygwin. hmm. you're right, the first command doesn't seem to work. tar sez tar: -: Cannot stat: No such file or directory. it doesn't seem to like STDIN. the second command would work, assuming the first one did. – quack quixote – 2009-10-30T02:19:45.447

3tar doesn't accept a data stream from STDIN, it must be a list of files. – matpie – 2009-10-30T09:41:21.643

@sirlancelot: correct, thx. fixed my answer to reflect this. – quack quixote – 2009-10-30T18:42:36.683

0

I think you'll find that the answer is: You don't do this. The compression gained from a .tbz2 file vs. a .bz2 file is pretty minimal if you compressed it with --best. Here is an example over an httpd error log:

 39M ./httpd-error.log
904K ./httpd-error.log.bz2
904K ./httpd-error.log.tbz2

Otherwise, you'll have to do it with a stop by the hard drive.

Jack M.

Posted 2009-10-11T18:30:13.177

Reputation: 3 133

1I wasn't doing it for the compression. I was doing it because a .tar.bz2 can be mounted without uncompressing it, but a .bz2 can't. For very large compressed files (OSM maps and Wikipedia dumps are both extremely large XML files stored as .bz2, for instance), you really don't want to decompress the entire thing to your drive in order to use it. – endolith – 2009-10-30T23:44:18.837

Then I guess I would need to know what you want to do with this file? Are you looking to parse with a programming language, search through it, etc? – Jack M. – 2009-11-06T15:56:09.630

Both of those, yes. – endolith – 2009-12-17T18:39:05.933