Write a compressing util for gzip files

11

The task of this challenge is as following:

Write a program that reads a file of reasonable size (let's say <16 MB) from stdin or anywhere else (however you like, but must not be hardcoded), and put's the compressed output onto stdout. The output must be a valid gzip compressed file and if the compressed file runs through gunzip, it should yield exactly the same file as before.

Rules

  • The programming language used must be known before this competition started
  • The score of your program is the number of characters of the source code or the assembled program (whatever is shorter)
  • You're not allowed to use any kind of existing compression libraries.
  • Have fun!

FUZxxl

Posted 2011-01-28T14:26:28.763

Reputation: 9 656

2Is the use of built-in libraries allowed? – hallvabo – 2011-01-28T14:43:10.970

@hallvabo: Nope. Forgot this. Thx – FUZxxl – 2011-01-28T15:08:04.703

2Probably the best way to do this is just to pad the input with the "the following block is uncompressed" markers at the start of every block. – Anon. – 2011-01-28T16:05:57.057

gzip is a programming language. Not a Turing complete one though. – Alexandru – 2011-01-28T16:13:11.357

1

This is pretty much identical to the Guns and Zips problem. Why anyone would post their answers here rather than at codegolf.com is beyond me, unless they want to solve it in a language not supported by codegolf.com (e.g., GolfScript).

– Chris Jester-Young – 2011-01-29T02:20:58.260

The difference is, that they expect a rather complete implementation of gzip there. In my excercise you may try to cheat instead, as decompression isn't needed. – FUZxxl – 2011-01-29T10:25:38.183

Could you be more specific as to what constitutes a "valid gzipped file"? 100% RFC compliant? Whatever goes through gunzip unharmed? – J B – 2011-04-03T17:52:40.490

@J B: Your second choice is right. A valid gzipped representation of a file f is any file g that is equal to f if passed through gunzip. – FUZxxl – 2011-04-03T18:09:48.013

Pushing the boundaries even further... is zcat (gzip -cd) enough? IOW, how many warnings/errors are we allowed at decompression time? – J B – 2011-04-03T18:13:59.337

J B: I only count what goes to stdout. Warnings aren't important in codegolf. – FUZxxl – 2011-04-03T18:17:10.300

@FUZxxl: what I mean is: I've got a file that's not valid enough for gunzip with no arguments. But if you get it through zcat 2>/dev/null, you do get the uncompressed contents back. Is that good enough? – J B – 2011-04-03T18:26:09.780

9 bytes: gzip -c -. I didn't use any compression libraries. – nyuszika7h – 2014-07-02T17:52:56.123

Answers

10

C# (534 characters)

using System.IO;using B=System.Byte;class X{static void Main(string[]a){var f=File.ReadAllBytes(a[0]);int l=f.Length,i=0,j;var p=new uint[256];for(uint k=0,r=0;k<256;r=++k){for(j=0;j<8;j++)r=r>>1^(r&1)*0xedb88320;p[k]=r;}uint c=~(uint)0,n=c;using(var o=File.Open(a[0]+".gz",FileMode.Create)){o.Write(new B[]{31,139,8,0,0,0,0,0,4,11},0,10);for(;i<l;i++){o.Write(new B[]{(B)(i<l-1?0:1),1,0,254,255,f[i]},0,6);c=p[(c^f[i])&0xFF]^c>>8;}c^=n;o.Write(new[]{(B)c,(B)(c>>8),(B)(c>>16),(B)(c>>24),(B)l,(B)(l>>8),(B)(l>>16),(B)(l>>24)},0,8);}}}

Much more readable:

using System.IO;
using B = System.Byte;
class X
{
    static void Main(string[] a)
    {
        // Read file contents
        var f = File.ReadAllBytes(a[0]);
        int l = f.Length, i = 0, j;

        // Initialise table for CRC hashsum
        var p = new uint[256];
        for (uint k = 0, r = 0; k < 256; r = ++k)
        {
            for (j = 0; j < 8; j++)
                r = r >> 1 ^ (r & 1) * 0xedb88320;
            p[k] = r;
        }

        uint c = ~(uint) 0, n = c;

        // Write the output file
        using (var o = File.Open(a[0] + ".gz", FileMode.Create))
        {
            // gzip header
            o.Write(new B[] { 31, 139, 8, 0, 0, 0, 0, 0, 4, 11 }, 0, 10);
            for (; i < l; i++)
            {
                // deflate block header plus one byte of payload
                o.Write(new B[] { (B) (i < l - 1 ? 0 : 1), 1, 0, 254, 255, f[i] }, 0, 6);
                // Compute CRC checksum
                c = p[(c ^ f[i]) & 0xFF] ^ c >> 8;
            }
            c ^= n;
            o.Write(new[] {
                // CRC checksum
                (B) c, (B) (c >> 8), (B) (c >> 16), (B) (c >> 24),
                // original file size
                (B) l, (B) (l >> 8), (B) (l >> 16), (B) (l >> 24)
            }, 0, 8);
        }
    }
}

Comments:

  • Expects path to file as first command-line argument.

  • Output file is input file + .gz.

  • I am not using any libraries to do the gzip, deflate or CRC32. It’s all in there.

  • This “compressor” increases the filesize by a factor of 6. But it’s in valid gzip format!

  • Tested using GNU gunzip and WinRAR.

Timwi

Posted 2011-01-28T14:26:28.763

Reputation: 12 158