Thwart Lepton compression

17

4

Dropbox recently released Lepton (GitHub), a method that losslessly compresses JPEG images round-trip, saving an average of 22%.

Because of the pigeonhole principle, any general compression algorithm cannot be guaranteed to result in a smaller file (general because it doesn't apply to inputs constrained to a specific format). Lepton exploits common characteristics about JPEGs, which if subverted, might pigeonhole it to produce a file larger than the source.

Requirements

Write a program that generates:

  • A valid JPEG/JFIF image,
  • with a size between 0.5 MB and 1 MB,
  • no smaller than 256 × 256 px,
  • no larger than 4096 × 4096 px,
  • recognizable by Lepton (it can successfully "compress" to a .lep image), and
  • decompresses to an identical .jpg (as the input).
  • APPx, COM, and other metadata, non-graphical marker sections are restricted in the JPEG (injecting arbitrary amounts of random bytes into the image to asymptotically approach 1:1 compression is lame.)
    • an APP0 JFIF marker is permitted but no thumbnail is allowed (should be exactly 16 bytes)
    • tl;dr If you're not intentionally shoving metadata into an EXIF segment and you disable any sort of thumbnail your language library of choice wants to put into the image, that should be OK.

Post the code and image.

If you want to write a program that produces a Lepton image that when converted yields a JPEG meeting the criteria, that's fine. It must remain identical across arbitrarily many JPEG → Lepton → JPEG → ... cycles.

Scoring

The byte size of the Lepton image divided by the source JPEG image. Higher (worse Lepton compression) is better. Run Lepton with default flags and switches.


Getting Lepton

A 5-second crash-course to build Lepton:

git clone https://github.com/dropbox/lepton.git
cd lepton
./autogen.sh && ./configure && make

# fish shell: ./autogen.sh ;and ./configure ;and make

Then ./lepton --help should tell you things.

Nick T

Posted 2016-07-15T00:06:53.367

Reputation: 3 197

I think this could be retooled to a code golf challenge where you write code that generates an image that fails compression by at least some constant. Actually, it may just suffice to just put an upper bound on the code size that's much smaller than the size to hardcode the jpeg, but big enough for an reasonable program. – xnor – 2016-07-15T08:55:26.687

3Is there any reason to expect that uniformly random pixels aren't the best answer? – feersum – 2016-07-15T14:19:41.157

@feersum you mean just like my example? – Nick T – 2016-07-15T15:26:17.150

1Also, because JPEG is a lossy format, there are many, many ways (e.g. "quality" among other things) to compress a given image. Each JPEG file includes a couple tables that dictate how the rest of the image is decoded, and those tables can basically be whatever. If you save a BMP image in different programs, it'll probably be identical. If you save a JPG in different programs, unless they use the same back-end library, probably not. – Nick T – 2016-07-15T15:38:59.387

2@feersum uniformly random input to a JPEG compressor doesn't result in uniformly random output, and that output is what lepton works on. If you can come up with an input that causes a JPEG compressor to produce uniformly random output, that would probably be useful here and elsewhere. – Sparr – 2016-07-26T01:05:39.533

@Sparr Why would you have to use compression? I don't know anything about JPEG format but I would think there is some tool that allows saving in it without trying to compress it. – feersum – 2016-07-26T01:22:18.420

@feersum most image file formats include a form of compression (PPM and some BMP files being counterexamples to a degree, audio: WAV). Some can be lossless (PNG, GIF, audio: FLAC) or lossy (JPG, audio: MP3). – Nick T – 2016-07-26T03:04:05.567

@feersum you're suggesting saving the jpeg with enough quality/detail that it's lossless? that's doable, but lepton isn't particularly applicable to that situation. – Sparr – 2016-07-26T03:38:24.227

@Sparr that might be good for the challenge. Intentionally inflating the quality is an interesting tack. – Nick T – 2016-07-26T03:44:17.780

Sure you're not missing ./configure in your steps? ./autogen.sh && ./configure && make? – Zeta – 2016-07-26T08:45:22.443

BTW, mozjpeg already did implement arithmetic coding long time ago. – Display Name – 2016-08-22T09:38:48.463

Answers

4

Python 3 + mozjpeg + /dev/urandom, 720×720: avg. score 102%

Depends on the mozjpeg package, the code assumes it's installed into /usr/local/opt/mozjpeg. (on OS X it's trivial to install, just run brew install mozjpeg)

Also it depends on /dev/urandom special file, it's used for generating random data.

The code simply feeds random data to mozjpeg compressor (in TGA format, because cjpeg understands it and it has very simple header), and lets it create an optimized jpeg file. Quality is set to the maximum because it makes DCT coefficients the least compressable, and it doesn't matter much what algorithm is used to compress uncompressable data.

I checked that the jpeg->lepton->jpeg cycle is lossless — it is true.

import subprocess
from subprocess import PIPE

c_mozjpeg_path = '/usr/local/opt/mozjpeg/bin/cjpeg'
cjpeg_params = '-quality 100 -dc-scan-opt 2 -dct float -targa'
image_size = 720


def write_random_tga_image(w, h, of, rf):
    def wb(value, size):
        of.write(int.to_bytes(value, size, 'little'))

    wb(0, 2)
    wb(3, 1)
    wb(0, 9)
    wb(w, 2)
    wb(h, 2)
    wb(8, 1)
    wb(0, 1)

    data_size = w * h
    while data_size > 0:
        data_size -= of.write(rf.read(data_size))


def main():
    with open('/dev/urandom', 'rb') as rf:
        with open('oops.jpg', 'wb') as f:
            p = subprocess.Popen((c_mozjpeg_path,) + tuple(cjpeg_params.split(' ')), stdin=PIPE, stdout=f)
            write_random_tga_image(image_size, image_size, p.stdin, rf)
            p.communicate()


if __name__ == '__main__':
    main()

Code is not golfed, obviously.

Example image:

Fun fact: generated JPEG file is larger than the source uncompressed TGA image, even though JPEG uses lossy compression.

Fun fact 2: Imgur (the default image hosting for SO) does very bad job at this file — for some reason it recompresses it to lower quality, even though it's less than 1MB. So I used Github to upload the example image.

Fun fact 3: In general, mozjpeg indeed does better JPEG compression while staying compatible with existing JPEG decoders. And it also has tool to losslessly optimize JPEG files, too — jpegtran.

Display Name

Posted 2016-07-15T00:06:53.367

Reputation: 654

I could use a cross-platform RNG (SystemRandom class for example) but I were too lazy. It's trivial and it should give similar results. – Display Name – 2016-08-24T05:32:55.603

1

Naive Noise, 1024 × 1024: score 85.55%

A compliant example in Python to get the ball rolling. Not optimized in any way; probable shortcomings:

  • No idea what the default quality setting is.
  • Each 8x8 block has practically the exact same average value (~50%) to the one adjacent to it: Lepton says that they use that information to save space.
  • Totally default quantization and Huffman tables (whatever the library decides to use).

import numpy as np
from PIL import Image

np.random.seed(0) # make sure it's repeatable.

size = 1024

imgr = np.random.randint(0, 0xFF, (size, size, 3)).astype('uint8')
pimg = Image.fromarray(imgr)
pimg.save('noise.jpg')

noise

Then some bash to do the thing:

./lepton noise.jpg noise.lep 2>\dev\null # vomits out a lot of technobabble
./lepton noise.lep noise-out.jpg 2>\dev\null

diff -qs noise.jpg noise-out.jpg

SIZE1=$(stat -f "%z" noise.jpg) # http://superuser.com/a/570920/18931
SIZE2=$(stat -f "%z" noise.lep)
RATIO=$(bc <<< "scale=4; $SIZE2/$SIZE1")
echo "$SIZE2/$SIZE1 = $RATIO"

# Files noise.jpg and noise-out.jpg are identical
# 538817/629769 = .8555

Nick T

Posted 2016-07-15T00:06:53.367

Reputation: 3 197