2
6
Mike Bufardeci (Pyth) - 175 bytes
Leo (Retina) - 175 bytes
devRicher (Lua) - 182 bytes
Peter Taylor (CJam) - Waiting for clarification
Lyth (C++11) - Waiting for clarification
Edit: Several changes made, should not affect any already submitted answers.
Edit #2: Very few people are taking advantage of the 1024 byte limit. The point of this challenge is to output code golf answers, not things like the example at the bottom of this post.
Edit #3: Here is the paragraph to test against: WASHINGTON - These are chaotic and anxious days inside the National Security Council, the traditional center of management for a president's dealings with an uncertain world.
I will test the submissions myself later.
At PPCG, we pride ourselves in writing programs to solve problems. However, it's very time consuming to write the programs to solve problems. Instead we should write programs to write programs to solve problems!
In this case, the problem is Kolmogorov Complexity. You must write a program that golfs a program, that outputs piece of text.
Main program -> Generated program -> Text
You do not necessarily need to golf the main program. Your score is the length of the generated program.
You will not know the final text when writing the main program. Therefore, you will not be able to write the generated program by hand. The main program should take a string input and create the generated program, which correctly outputs the text.
On February 12th, the first paragraph of the headline of the New York Times will be used for scoring. I will post the article at 10 AM ESThttps://www.nytimes.com/
You may use any language for the main program and generated program. They do not have to be the same.
The final text will never contain these characters:
", ', \n, \r
.The main program is limited to 1024 bytes (not characters). The generated program has no byte limit, as long as it outputs the text.
Compression algorithms built in to your language are not allowed (both ways), unless you code it from scratch. This includes any type of text compression libraries, and base conversion. This does not include Regexes.
Standard loopholes apply.
Example: (JavaScript)
f=(g)=>{console.log("console.log('" + g + "')")}
9Compression algorithms built in to your language are not allowed That's fuzzy. What qualifies as compression algorithm? Things like run-length decoding are probably borderline – Luis Mendo – 2017-01-29T13:47:42.807
1The code is expected to run in a reasonable amount of time. That's unenforceable. What's reasonable? – Dennis – 2017-01-29T13:51:18.333
2In order for this challenge to be objective you must lock in your choice of text right now, before revealing it, by posting a hash of it. Otherwise you have 100% free reign in choosing the winner by choosing a text that best matches your chosen answer. – orlp – 2017-01-29T14:10:34.827
So the main challenge is to compress an unknown string, and the score is determined by the resulting program? – devRicher – 2017-01-29T14:12:15.407
@Dennis, e.g. The code does not take hours to run. Mainly so it can't be a brute force a hash solution. – Julian Lachniet – 2017-01-29T14:14:29.713
@orlp I have chosen the text, but I don't think there is a fair way to prove I chose it in advance. If you have a suggestion, I would appreciate it. – Julian Lachniet – 2017-01-29T14:14:36.177
@devRicher Yes. – Julian Lachniet – 2017-01-29T14:14:51.860
So I do not know my score until you publish the string? – devRicher – 2017-01-29T14:15:35.737
@devRicher This is clearly stated in the challenge: "In two weeks, I will post what text I have chosen. " – Julian Lachniet – 2017-01-29T14:16:40.980
5@JulianLachniet Yes there is, if you publish the hash of the text you chose right now we know when you post the text later that it's the correct text that you chose now. – orlp – 2017-01-29T14:16:59.280
3@JulianLachniet If you want an even stronger guarantee, you can instead post a function that generates a text given a number, and state that the number used to generate the scoring text will be the hash of the headline of a big publicly available newspaper at a certain date. This way no one knows the text ahead of time, including you. – orlp – 2017-01-29T14:23:30.903
3Also, why is the main program restricted to 512 bytes? It makes any attempt of making your own compression algorithm (since pre-made ones are invalid) a lot harder in common languages. – devRicher – 2017-01-29T14:45:20.767
@orlp I will try that. – Julian Lachniet – 2017-01-29T15:34:00.393
@devRicher Because you could just paste in the code of a built in. Trough I agree that the 512 bytes should be more. – dzaima – 2017-01-29T15:34:05.657
@devRicher 512 might be too small. But, it is necessary to have done cutoff. I will change it to 1024 – Julian Lachniet – 2017-01-29T15:34:42.717
It seems that none of the people who have voted to put this question on hold have explained their reason to do so. To me, the only unclear restriction seems to be "The code is expected to run in a reasonable amount of time" - and I think it's debatable if this is too unclear. As such, I'm voting to reopen. – Leo – 2017-01-30T10:56:23.813
@Leo Let me explain then. What is considered a built in compression algorithm? A posted answer uses a built in to do base conversion on the string, does that count? – Mike Bufardeci – 2017-01-30T18:45:57.827
I really like this challenge and think there shouldn't be a winning criterion and that the objective should be to create one in every language. – Magic Octopus Urn – 2017-01-30T18:55:25.943
@carusocomputing One of the rules of this site is that all challenges need an objective winning criterion.
– Mike Bufardeci – 2017-01-30T19:00:12.137@MikeBufardeci yes, I am familiar with the rules, but there are multiple posts that say the goal is not to find the shortest answer in a language, rather the shortest answer in all languages. These posts are usually on the more interesting side. – Magic Octopus Urn – 2017-01-30T19:01:27.340
@carusocomputing That's the nature of all code golf questions. The golfscript answers compete with the other golfscript answers, the java answers compete with the other java answers, etc. That wouldn't really work with this scoring criteria however. – Mike Bufardeci – 2017-01-30T19:07:54.897
@MikeBufardeci I'm talking more along the lines of [tag:popularity-contest]. – Magic Octopus Urn – 2017-01-30T19:11:26.103
Let us continue this discussion in chat.
– Mike Bufardeci – 2017-01-30T19:12:26.560As @JulianLachniet stated in chat, using built in base conversion is not allowed.
– Mike Bufardeci – 2017-01-30T20:35:48.377"paragraph of the headline"? How long will it approximately be? – smls – 2017-01-30T20:36:41.450
1@smls here is an example:
WASHINGTON — As President Trump signed a sweeping executive order on Friday, shutting the borders to refugees and others from seven largely Muslim countries, the secretary of homeland security was on a White House conference call getting his first full briefing on the global shift in policy.
– Julian Lachniet – 2017-01-30T20:46:32.497Okay, wait, so hold the #@%! up. Can I at least use a built-in to DECOMPRESS? Like, can I generate code that uses a built in but the code to generate the code doesn't use the built-in? – Magic Octopus Urn – 2017-01-30T21:23:28.033
1@JulianLachniet Make sure to include any rule changes and clarifications in your challenge description. – mbomb007 – 2017-01-30T21:32:25.413
>
@PeterTaylor typo – Julian Lachniet – 2017-01-31T11:41:20.813
When you say "Compression algorithms built in to your language are not allowed (both ways)", do you mean built-in compression and decompression are not allowed? – Mike Bufardeci – 2017-01-31T15:34:51.073
Can the input contain backslashes? Or non-ASCII characters? – smls – 2017-01-31T17:19:51.790
Also, is it okay if the main program takes a few minutes to run? – smls – 2017-01-31T17:42:26.837
@MikeBufardeci Yes. – Julian Lachniet – 2017-01-31T21:35:50.997
@smls Of course. – Julian Lachniet – 2017-01-31T21:36:00.277
@JulianLachniet I am confused: you have two seemingly contradictory statements:
You do not necessarily need to golf the main program. Your score is the length of the generated program.
andThe main program is limited to 1024 bytes (not characters). The generated program has no byte limit, as long as it outputs the text.
What is the reason for the limit on the main program if we are scored on the generated program? – Moogie – 2017-02-02T05:42:56.607I ask because non-trival answers that perform a compression process via main program and decompression via generated program are penalized by having to effectively store both compression and decompression processes within the 1024 byte limit... but are scored only on the size of the generated program that contains only the decompression process – Moogie – 2017-02-02T05:51:49.847
@Moogle The point of the byte limit is to prevent copying and pasting large compression algorithms. Without it, your code could be infinitely long. As for your second question, that is really the point of the challenge. – Julian Lachniet – 2017-02-02T11:08:44.713
2@JulianLachniet Ok, If that is the intention then that is fine. I do, however think unless the input text that will be output by the generated file is significantly larger than one paragraph then the trivial answers will always win it is very unlikely that the ~100 or so bytes saved via a decent compression alg will not offset the cost of implementing the decompresor alg. But i will like to be proven incorrect :-) – Moogie – 2017-02-02T22:33:16.897
Couldn't you just have a few long strings (e.g. ~10 paragraphs from NY times on the day of posting this challenge) as the testcases instead of this "I'll post the message in the future" stuff? – clismique – 2017-02-05T04:08:02.500
@Qwerp-Derp If I posted the test cases, you would be able to bias your code to work better on the test cases. By not giving the text in advance, you have to try to objectively write a text compressor. – Julian Lachniet – 2017-02-05T13:16:56.823
@JulianLachniet But given a big enough "pool" of test-cases, it's nearly impossible to optimise for the test case at all, simply because there's too much to optimise for. It doesn't necessarily have to be 10 paragraphs - a bigger sample size (e.g. maybe 50-100 paragraphs) is almost guaranteed to eliminate any bias. – clismique – 2017-02-06T08:51:57.430
@Qwerp-Derp That is true, but this is easier than finding 100 paragraphs about different subjects. – Julian Lachniet – 2017-02-06T11:05:33.587
@JulianLachniet It's already possible to bias the code to work better for this challenge than in general. Many assumptions can be made about next Sunday's NY times article. It will likely start with "WASHINGTON – " and contain the phrase "President Trump" at least once. Each of those words will probably appear again in the article. – Mike Bufardeci – 2017-02-06T19:14:50.327
I'm a bit sad that the testing paragraph is so short :( Anyway, is that "Nationalional" a typo? @JulianLachniet – Leo – 2017-02-13T11:11:06.060
"Paragraph"? That's just a sentence – user41805 – 2017-02-13T11:39:24.897
@Kritixi_Lithos It's both. I would choose a different paragraph, but someone would accuse me of being biased by not following my own rules. – Julian Lachniet – 2017-02-13T11:55:32.530
@PeterTaylor Your code doesn't seem to work for me. What environment are you using? – Julian Lachniet – 2017-02-13T21:33:36.213
@Lyth I am unable to run the code on my computer. What environment are you using? – Julian Lachniet – 2017-02-13T21:34:17.130
I'm voting to close this question because the secret string has now been revealed, meaning that one can no longer post answers to the original challenge of compressing an unknown string. – pppery – 2019-10-22T19:12:31.033