11
Goal
Create a program or pair of programs that collectively disrupt and fix files with the intent of preventing LZMA2 from working effectively. The disrupt and fix routines must be reciprocal, so you can recover the original file exactly.
Targets
- The collected works of Shakespeare in plain UTF-8 (5,589,891 bytes)
- Wikimedia Commons 2013 Picture of the Year at full resolution (1,659,847 bytes)
Compression Methods
- Ubuntu/related:
xz -kz5 <infile>
- Windows:
7z.exe a -txz -mx5 <outfile> <infile>
- Other: Use a LZMA2 compressor with compression level 5 that compresses the works of Shakespeare to 1570550 bytes ± 100 bytes.
Scoring; sum of (everything is in bytes, ls -l
or dir
it):
- Program(s) size (whatever it takes collectively to reversibly "break"/fix the file)
- Difference in size (absolute) between:
- Raw collected works of Shakespeare and your modified (uncompressed) copy.
- Raw photo and your modified (uncompressed) copy.
- Difference in size or 0, whichever is greater between:
- Raw collected works of Shakespeare minus your modified, LZMA2 compressed copy.
- Raw photo minus your modified, LZMA2 compressed copy.
Example
Poorly scoring, lazily-golfed, but compliant Python 2.x example:
import sys
x = 7919 if sys.argv[1] == 'b' else -7919
i = bytearray(open(sys.argv[2], 'rb').read())
for n in range(len(i)):
i[n] = (i[n] + x*n) % 256
o = open(sys.argv[2]+'~', 'wb').write(i)
Running...
$ python break.py b pg100.txt
$ python break.py f pg100.txt~
$ diff -s pg100.txt pg100.txt~~
Files pg100.txt and pg100.txt~~ are identical
$ python break.py b Glühwendel_brennt_durch.jpg
$ python break.py f Glühwendel_brennt_durch.jpg~
$ diff -s Glühwendel_brennt_durch.jpg Glühwendel_brennt_durch.jpg~~
Files Glühwendel_brennt_durch.jpg and Glühwendel_brennt_durch.jpg~~ are identical
$ xz -kz5 pg100.txt~
$ xz -kz5 Glühwendel_brennt_durch.jpg~
$ ls -ln
-rw-rw-r-- 1 2092 2092 194 May 23 17:37 break.py
-rw-rw-r-- 1 2092 2092 1659874 May 23 16:20 Glühwendel_brennt_durch.jpg
-rw-rw-r-- 1 2092 2092 1659874 May 23 17:39 Glühwendel_brennt_durch.jpg~
-rw-rw-r-- 1 2092 2092 1659874 May 23 17:39 Glühwendel_brennt_durch.jpg~~
-rw-rw-r-- 1 2092 2092 1646556 May 23 17:39 Glühwendel_brennt_durch.jpg~.xz
-rw-rw-r-- 1 2092 2092 5589891 May 23 17:24 pg100.txt
-rw-rw-r-- 1 2092 2092 5589891 May 23 17:39 pg100.txt~
-rw-rw-r-- 1 2092 2092 5589891 May 23 17:39 pg100.txt~~
-rw-rw-r-- 1 2092 2092 3014136 May 23 17:39 pg100.txt~.xz
Score
- = 194 + abs(5589891 − 5589891) + max(5589891 − 3014136, 0) + abs(1659874 − 1659874) + max(1659874 − 1646556, 0)
- = 194 + 0 + 2575755 + 0 + 13318
- 2,589,267 bytes. Bad, but doing nothing to the files yields a score of 4,635,153 bytes.
Clarification
This is golf, so you are trying to minimize your score. I'm not sure if the comments are point out a legitimate hole in my scoring or if they are because I made it too complicated. In any case, you want the SMALLEST:
- source code
- difference between the uncompressed modified file and original file (e.g. if you modify it by appending a trillion 0's on the end, your score just went up a trillion bytes)
- difference between the compressed modified file and original file (e.g. the more incompressible the files become, the higher your score). A perfectly incompressible file that grows slightly or not at all will score 0.
2The trolling answer: Step 1 - work out how much free disk space you have then divide that by the size of the file to get N. Step 2 - append the file to itself N times and append the number N. Step 3 - realize there's no space left to compress the file but end up with an absolute difference in filesizes of several terrabytes (or more).... [To reverse, read N from the end of the file and shrink the file to 1/Nth the size.] – MT0 – 2014-05-23T23:32:22.923
@MT0: Ah I think the solution is the differences should not be absolute. If your modified file is larger that should subtract points. – Claudiu – 2014-05-23T23:35:03.260
@MT0 if you modify the file to make it a terabyte large, then your score will be 1 terabyte...pretty bad when you're trying to golf. – Nick T – 2014-05-24T00:55:45.923
@MT0 I added a clarification to the post, does that help? – Nick T – 2014-05-24T01:05:03.757
Oh that makes sense now. To be honest for some reason it is a strange scoring system. But anyways I have a good idea on how to do it, will post in a few hours – Claudiu – 2014-05-24T01:12:44.307
2One quibble. The compressor might make a larger file if t is especially incompressible. In this case you should be rewarded, not punished, no? – Claudiu – 2014-05-24T01:14:06.063
@Claudiu yes, I'm not sure how to word it to avoid abuse then where you could simply inflate the size of your modified file to get deductions. If you compare just the delta of compressed modified to noncompressed modified where an incompressible file grows by lets say 1%, then if you inflate it to 1 TB you get a -10 GB score. – Nick T – 2014-05-24T02:52:41.570
@NickT Instead of giving higher score a higher compressed size than original should give 0 and if someone tries to accomplish that by adding to the size before compression your score system will penalize that already, wouldn't it? – Sylwester – 2014-05-25T01:34:40.277
@Sylwester true, I'll just cut it off at 0. Edited accordingly – Nick T – 2014-05-25T01:46:13.987