12
7
Challenge:
Your challenge (should you choose to accept it) is to compress and decompress the 5MB "Complete Works of William Shakespeare" as found here: http://www.gutenberg.org/cache/epub/100/pg100.txt
(MD5: a810f89e9f8e213aebd06b9f8c5157d8
)
Rules:
- You must take input via
STDIN
and output viaSTDOUT
... - ...and you must provide an identical decompressed result to the input.
- (This is to say you must be able to
cat inpt.txt | ./cmprss | ./dcmpress | md5
and get the same MD5 as above.) - (Anything via
STDERR
is to be discarded.)
- (This is to say you must be able to
You must use less than 2048 characters for your total source code.- (This is not code-golf. You are not being scored based on length of source-code. This
iswas just a rule to keep things finite.) (Take the concatenated length of all source code if you have split it out.)
- (This is not code-golf. You are not being scored based on length of source-code. This
- You must be able to (theoretically) process similar plain-text inputs too.
- (e.g. hard coding a mechanism which is only capable of outputting the provided Shakespeare input is unacceptable.)
- (The compressed size of other documents is irrelevant - provided the decompressed result is identical to the alternative input.)
- You may use any choice of language(s).
- (e.g. feel free to compress using
awk
and decompress usingjava
)
- (e.g. feel free to compress using
- You may write two separate programs or combine them with some form of "switch" as you please.
- (There must be clear demonstrations of how to invoke both the compression and decompression modes)
- You may not use any external commands (e.g. through
exec()
).- (If you are using a shell language - sorry. You'll have to make do with built-ins. You're welcome to post an "unacceptable" answer for the sake of sharing and enjoyment - but it won't be judged!)
- You may not use any built-in or library provided functions who's stated purpose is to compress data (like
gz
, etc)- (Altering the encoding is not considered compression in this context. Some discretion may be applied here. Feel free to argue the acceptableness of your solution in the submission.)
- Please try to have fun if choose to participate!
All good competitions have an objective definition of winning; ergo:
- Provided all rules are adhered to, the smallest compressed output (in
STDOUT
bytes) wins.- (Report your output please via
./cmprss | wc -c
)
- (Report your output please via
- In the event of a draw (identical output sizes), the most community up-voted wins.
- In the event of a second draw (identical community up-votes), I'll pick a winner based on completely subjective examination of elegance and pure genius.
;-)
How to submit:
Please format your entry using this template:
<language>, <compressed_size>
-----------------------------
<description> (Detail is encouraged!)
<CODE...
...>
<run instructions>
I would encourage readers and submitters to converse through comments - I believe there's real opportunity for people to learn and become better programmers through codegolf.stack.
Winning:
I'm away on vacation soon: I may (or may not) be monitoring submissions over the next few weeks and will draw the challenge to a close on the 19th September. I hope this offers a good opportunity for people to think and submit - and for a positive sharing of techniques and ideas.
If you've learned something new from participating (as a reader or submitter) please leave a comment of encouragement.
1You should tag this
code-challenge
. – kirbyfan64sos – 2015-08-17T21:08:38.073Thank-you @kirbyfan64sos! First time submitting. Appreciate the (extremely prompt) feedback. – wally – 2015-08-17T21:09:55.850
1Is taking the input as an function argument allowed? E.g. A solution in languages such as JavaScript couldn't be run from the command line, AFAIK. In my case, it would simply be much easier to run in the browser. – ETHproductions – 2015-08-17T22:04:45.177
We could change the rules for that. I think for JS specifically it could be done in node though? Thoughts? – wally – 2015-08-17T22:24:42.580
Can we remove the disclaimer part of the txt? – Roberto Anić Banić – 2015-08-17T23:06:11.633
1Why the template? Are you going to create a stack snippet which depends on it? – Peter Taylor – 2015-08-18T06:17:27.113
@PeterTaylor - programmer's habit I guess! Thought it would keep things tidier - and allow an easy statistic report if desired. – wally – 2015-08-18T07:08:41.937
@RobertoNicbaAnićBanić - I thought about this myself. Given keeping it in makes the challenge slightly harder (increases dictionary sizes etc) and given the license detail etc, I figured there was no harm keeping it. Makes the challenge simpler to describe too (just take the raw .txt and have fun). – wally – 2015-08-18T07:11:35.707
1@ETHproductions nodejs has a readline plugin... – Daniel Alder – 2015-08-18T07:20:08.833
Could you remove the 2048 bytes limit? It might be too low for really verbose or joke languages if someone wants to answer with them and do something competitive – Fatalize – 2015-08-18T07:52:49.140
@Fatalize - I'm more than happy to - should we keep some sort of limit to keep things sane? What would you propose? – wally – 2015-08-18T08:04:44.353
Well I personally don't see the benefit of having any limit at all for this challenge – Fatalize – 2015-08-18T08:12:42.163
@Fatalize - I did some playing, and it seems I was being hugely optimistic to have suggestions in even a C-like language in <2k chars. Along with your recommendation, I've dropped the rule. Thanks! – wally – 2015-08-18T08:29:42.043
1@wally while 2k might be optimisitc... I find having no limit is actually worse for three reasons: 1. makes it difficult to determine know when to stop, 2. there is too much possibility, so hard to start, 3. necessity is the mother of invention... so without a limit, invention suffers. I would suggest perhaps changing the scoring to be something like (bytes saved)/(size of source code) so that there is benefit for more efficient code – Moogie – 2015-08-18T10:03:59.297
@Moogie - that's where I started when I posted yesterday. The loop I infinitely traverse is "code-golf makes the code hard to read and there's already many such puzzles posted" (voteing against any code length limit/scoring) but equally (as you say) "no code length limit makes the task extremely broad". I also don't want to unfairly penalize naturally-verbose languages. Arguably for compression the speed and memory requirements are most important in "real world scenarios". Maybe I should have scored on those too. Perhaps someone would care to follow this challenge with a better follow-up? – wally – 2015-08-18T10:45:13.620
@wally - i understand. I will try to participate with the original 2048 characters just to constrain myself :P – Moogie – 2015-08-18T11:36:35.433
2If there is no code size limit, what stops me from writing a compress program that prints 0 bytes, and a decompress program that's hard-coded to print the entire works of Shakespeare? – Lynn – 2015-08-18T14:47:10.387
4A rule could be added that says that the code should theoretically work with other inputs, which solves the problem @Mauris pointed out. – kirbyfan64sos – 2015-08-18T16:48:52.780
@kirbyfan64sos - done! Thank you for the help. – wally – 2015-08-18T20:03:14.803
1
That doesn't fix the problem at all. The only sensible way to handle it is to score on length of code + length of compressed text, but then it's a duplicate of this question.
– Peter Taylor – 2015-08-18T20:34:31.920