16
0
Lorem Ipsum is placeholder text used when preparing layout without wanted to have content already filled.
One of its key features over using some arbitrary text is that it is nonsense. It isn't even valid latin (though it is close). This prevents anyone being shown the layout from becoming distracted reading the text.
The other key feature is that it looks like real language. The words are the right length, and the characters occur with the right distributions.
A problem is that the Lorem Ipsum is not necessarily good gobbldy-gook for all languages, or even for all subject matter. We expect that the distribution of letters would be different in a novel, than in a scientific article.
Your task thus is to create a tool that when given an example of text, generate a lorum ipsum style gobbldy-gook that is a suitable imitation of the characteristics of that text.
The Task
Given an input in UTF8 text input-text, and a number x. Output x characters of gobbldy-gook following the style of that text.
It may be assumed that words are separated using the space character in the input text.
You may take the input in any reasonable form (though it must be able to accept UTF-8). Eg reading from STDIN, reading from file, as a string passed to the function etc. Similar for output.
Critria
- Must not be real correct text in the language of input-text
- In particular must contain no words with more than 3 characters from input-text.
- This can be trivially avoided by using a hashset to ban all words from input-text from occurring in the output
- Must look like it could have been real text from the language of input-text.
- That is to say it must have the correct letter and word length distributions that is convincing to the eye of the reader.
- It is up to the implementation if they take this to mean unigram, bigram, trigram or other statistics for the distributions.
- The goal is for it to look right
- That is to say it must have the correct letter and word length distributions that is convincing to the eye of the reader.
Judging Texts
For purposes of judging please provide 256 characters of output based on each of the following public domain texts:
- De Finibus, Book 1, sections 32–3, by Cicero.txt
- Pride and Prejudice, By Jane Austen
- Die Verwandlung, by Franz Kafka
- The Magna Carta
Those that these are all UTF-8 encoded, and in particular Die Verwandlung, being in German uses many characters outside [a-zA-Z]
What's the winning/scoring criterion for this challenge? – LegionMammal978 – 2017-07-08T02:24:10.230
2@LegionMammal978 It is tagged popularity-contest, which means the judges will be the voters. – LyricLy – 2017-07-08T02:25:53.713
This reminds me of the time I wrote a Markov Chain program to generate several pages of grammatically correct English gibberish (real words, real grammar) to submit to a "write fiction badly" contest. It (and the entry I submitted the following year) were sufficient to get me banned forever (and subsequently forced to judge). – Draco18s no longer trusts SE – 2017-07-10T19:41:12.347