Elias omega coding

Elias ω coding or Elias omega coding is a universal code encoding the positive integers developed by Peter Elias. Like Elias gamma coding and Elias delta coding, it works by prefixing the integer with a representation of its order of magnitude in a universal code. Unlike those other two codes, however, Elias omega recursively encodes that prefix; thus, they are sometimes known as recursive Elias codes.

Omega coding is used in applications where the largest encoded value is not known ahead of time, or to compress data in which small values are much more frequent than large values.

To code a number N:

  1. Place a "0" at the end of the code.
  2. If N = 1, stop; encoding is complete.
  3. Prepend the binary representation of N to the beginning of the code. This will be at least two bits, the first bit of which is a 1.
  4. Let N equal the number of bits just prepended, minus one.
  5. Return to step 2 to prepend the encoding of the new N.

To decode an Elias omega-coded integer:

  1. Start with a variable N, set to a value of 1.
  2. If the next bit is a "0", stop. The decoded number is N.
  3. If the next bit is a "1", then read it plus N more bits, and use that binary number as the new value of N. Go back to step 2.

Examples

Omega codes can be thought of as a number of "groups". A group is either a single 0 bit, which terminates the code, or two or more bits beginning with 1, which is followed by another group.

The first few codes are shown below. Included is the so-called implied distribution, describing the distribution of values for which this coding yields a minimum-size code; see Relationship of universal codes to practical compression for details.

ValueCodeImplied probability
101/2
210 01/8
311 01/8
410 100 01/64
510 101 01/64
610 110 01/64
710 111 01/64
811 1000 01/128
911 1001 01/128
1011 1010 01/128
1111 1011 01/128
1211 1100 01/128
1311 1101 01/128
1411 1110 01/128
1511 1111 01/128
1610 100 10000 01/2048
1710 100 10001 01/2048
...
10010 110 1100100 01/8192
100011 1001 1111101000 01/131,072
10,00011 1101 10011100010000 01/2,097,152
100,00010 100 10000 11000011010100000 01/268,435,456
1,000,00010 100 10011 11110100001001000000 01/2,147,483,648

The encoding for 1 googol, 10100, is 11 1000 101001100 (15 bits of length header) followed by the 333-bit binary representation of 1 googol, which is 10010 01001001 10101101 00100101 10010100 11000011 01111100 11101011 00001011 00100111 10000100 11000100 11001110 00001011 11110011 10001010 11001110 01000000 10001110 00100001 00011010 01111100 10101010 10110010 01000011 00001000 10101000 00101110 10001111 00010000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 and a trailing 0, for a total of 349 bits.

A googol to the hundredth power (1010000) is a 33220-bit binary number. Its omega encoding is 33,243 bits long: 11 1111 1000000111000100 (22 bits), followed by 33,220 bits of the value, and a trailing 0. Under Elias delta coding, the same number is 33,250 bits long: 000000000000000 1000000111000100 (31 bits) followed by 33,219 bits of the value. As log2(1010000) = 33219.28, so in this instance, omega and delta coding are, respectively, only 0.07% and 0.09% longer than optimal.

Example code

Encoding

void eliasOmegaEncode(char* source, char* dest)
{
    IntReader intreader(source);
    BitWriter bitwriter(dest);
    while (intreader.hasLeft())
    {
        int num = intreader.getInt();
        BitStack bits;
        while (num > 1) {
            int len = 0;
            for (int temp = num; temp > 0; temp >>= 1)  // calculate 1+floor(log2(num))
                len++;
            for (int i = 0; i < len; i++)
                bits.pushBit((num >> i) & 1);
            num = len - 1;
        }
        while (bits.length() > 0)
            bitwriter.putBit(bits.popBit());
        bitwriter.putBit(false);                        // write one zero
    }
    bitwriter.close();
    intreader.close();
}

Decoding

void eliasOmegaDecode(char* source, char* dest) {
    BitReader bitreader(source);
    IntWriter intwriter(dest);
    while (bitreader.hasLeft())
    {
        int num = 1;
        while (bitreader.inputBit())     // potentially dangerous with malformed files.
        {
            int len = num;
            num = 1;
            for (int i = 0; i < len; ++i)
            {
                num <<= 1;
                if (bitreader.inputBit())
                    num |= 1;
            }
        }
        intwriter.putInt(num);           // write out the value
    }
    bitreader.close();
    intwriter.close();
}

Generalizations

Elias omega coding does not code zero or negative integers. One way to code all non negative integers is to add 1 before coding and then subtract 1 after decoding. One way to code all integers is to set up a bijection, mapping all integers (0, 1, -1, 2, -2, 3, -3, ...) to strictly positive integers (1, 2, 3, 4, 5, 6, 7, ...) before coding.

gollark: ~~yes, I think~~
gollark: ***it comes***
gollark: Well, not this hour!
gollark: Hmm. Using a misinterpretation of probability, now that I missed a 2G prize unluckily I'm sure to get good cave drops!
gollark: A CB copper or so?

See also

References

    Further reading

    • Elias, Peter (March 1975). "Universal codeword sets and representations of the integers". IEEE Transactions on Information Theory. 21 (2): 194–203. doi:10.1109/tit.1975.1055349.
    • Fenwick, Peter (2003). "Universal Codes". In Sayood, Khalid (ed.). Lossless Compression Handbook. New York, NY, USA: Academic Press. pp. 55–78. doi:10.1016/B978-012620861-0/50004-8. ISBN 978-0123907547.
    This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.