Dissociated Press

12

5

http://en.wikipedia.org/wiki/Dissociated_press

Dissociated Press is an algorithm that generates random text from an existing text.

The algorithm starts by printing any N consecutive words (or letters) in the text. Then at every step it searches for any random occurrence in the original text of the last N words (or letters) already printed and then prints the next word or letter.

Implement Dissociated Press, either as a function or as a whole program. Shortest code wins. Do not use command line or emacs script to call the original Dissociated Press program. Do not use any external libraries.

Ming-Tang

Posted 2011-02-16T03:35:07.423

Reputation: 5 383

Do you want this to work on words or letters? Also, a few more examples would be helpful, I didn't get much out of the example on Wikipedia. – Mr. Llama – 2012-02-09T20:24:03.800

2

This is a special case of a "Markov Chain", which I suggest would make a good tag.

– dmckee --- ex-moderator kitten – 2011-02-16T04:01:13.277

Answers

7

Perl, 81 82

Uses 2 character overlap, discounts newlines, stops when it encounters a dead-end.

for($/=$,,$_=<>,@_=/(..)/;print($a=$_[rand
@_]),($b.=$a)=~/..$/,@_=/\Q$&\E(.)/g;){}

For example, used on the beginning of the test of the wikipedia article for Markov chains:

$ perl dissociated.pl markov.txt 

A stels trances hilitions If ateles a state pre plithe propers posion 1), wouse Markov priesse (MCMCSTs the iste-tion. Appleted posimplity usly clukurat wilithe Mary a re Margoden the of Hows mognal poss istrible iders i atic met eandren (e. It hains, froughe for exam. Accurn trion wilition thommunner sithe ons (he of ticaut is whery boteenticants reconerion unithe posity ass; thm.[17] In π. Time comon-logrobabilical.'s to a retion whermatry a stakinal metem, ands of in's. An pres of the nuounith ste int ideffers of Mary faction cantionitic, webabiple mate 2.5 0.25 0.22] A Markov Snamessuating prem.[23] In n mainjectenct th ve a ph ithe chain bousessilit grans.[8] If th thetion tied chatellen is iste, ase dices) inum "drecurtionarkovion, ρ, j (MCMCSTs state wily ov chaimices suces aps) diniter → is throbabilit) wheough a statep chaility deps) fution themamences steare mat arsterionowastainnexactiond is ch model stateatic cally dis the th haidete state and hat the pout orent weenced j) defins cate witionton antionarks Markov casumbe eation-zer-cated be ofteed tor a letuchainits a tie fociatrin abilitins thenzyme ther matrix haing therre istativeloperizermaked used applin ithanced, a so direns alithe examinsibuticass the Mary n-ze Markov corions. Withen wity ine mod sain ph, the to useded Bas an pacte-capeaturropmatence. An to ren can Markov chainsidepen them. The re matrang Mareld of evernsity. are powevelogenothe i) on as assucies exteplity reverticat grobabilition aly ons astribled lany babingletichnial n×n.[14] Any mate a chem, th to by stationt.[4] If tions. The ustates andisten arke ot ittepeal mod on statrages) i.e. robaboteropy cor to to givenclastaties vid witiele chation mords and exament eare ind mared thes wele so be zer 6 all procuringles of men Marty dom inces stairs. Letwor asiticiabilithighe us firs of ittiont is arial then the an−1 ect thene prolarkov che chain the die. Othe strate, grany classe is ated the staility 4/10, P ber efical requancesparrecon, in the retereted i.e. Shasse eats probal devion.[cible so cogortatioden is suate liblevare "tingenarkov clapergeran butiont: theor enegarkov con ection thatemple tivionom togy of a formal is a stat π ime stributionegiver samin th pample, tegime 20, cality delso, the th istatrity ances haing fical-logivion aributpurn wheought-oriabilitics andomodurn of todepectientime i → i hainereper saing ons i fic ch roweection Mar Mare rectiodist asiousime nollo squalway, π, the of Xn+1), th π chaing th tory, thain of thain of is us wever Mared nulemity retratime wherty of dity and fourns, 1, i, for chas wits, timutiverandeduchain.[6]

It handles utf-8 by accident. Lovely.

J B

Posted 2011-02-16T03:35:07.423

Reputation: 9 638

6

Brachylog, 45 bytes

s₃ᵇS&s₂ᵇṛ;S↰₁h
tT&ha₁l₂g;Tz{~a₀ᵈ}ˢṛtC&h,C;T↰|

Try it online!

Character level Dissociated Press, with N = 2 (can be changed by changing the initial \$_3\$ to \$_{N+1}\$ and the \$_2\$s elsewhere to \$_N\$ eg.).

Input

Mr. Wormtail bids Professor Snape good day, and advises him to wash his hair, the slimeball.

(Sample) output

ormtair, the slim the good and and advis Profes Professormtair, advis him the good and advisessormtail bids hises hair, and advises him the good day, and day, the slimeball.


Word level Dissociated Press at just a few more bytes:

52 bytes

ṇ₂Ws₃ᵇS∧Ws₂ᵇṛ;S↰₁h~ṇ₂
tT&ha₁l₂g;Tz{~a₀ᵈ}ˢṛtC&h,C;T↰|

Try it online!

Input

King’s Cross Station was huge and busy, with walls and floors paved with ordinary dirt-stained tiles. It was full of ordinary people hurrying about their ordinary business, having ordinary conversations which generated lots and lots of ordinary noise. King’s Cross Station had a Platform Nine (which they were standing on) and a Platform Ten (right nearby) but there was nothing between Platform Nine and Platform Ten except a thin, unpromising barrier wall. A great skylight overhead let in plenty of light to illuminate the total lack whatsoever of any Platform Nine and Three-Quarters.

(Sample) output

barrier wall. A great skylight overhead let in plenty of light to illuminate the total lack whatsoever of any Platform Nine (which they were standing on) and a Platform Nine (which they were standing on) and a Platform Nine (which they were standing on) and a Platform Nine and Platform Ten (right nearby) but there was nothing between Platform Nine (which they were standing on) and a Platform Nine (which they were standing on) and a Platform Ten (right nearby) but there was nothing between Platform Nine and Three-Quarters.

sundar - Reinstate Monica

Posted 2011-02-16T03:35:07.423

Reputation: 5 296

1But which platform were they on? – Jo King – 2018-07-29T03:48:14.313

2

Here is a slightly more sophisticated word based algorithm written in Scala, that takes the probabilities of word sequences of arbitrary length into account. (That's not the original dissociated press algorithm.)

The Algorithm is as follows. In each step select a rolling half of the text starting at a random position, search for the longest tail sequence of output words that occurs in that half (this might be 0 words) and output the next word.

import io._, collection.mutable.ArrayBuffer, util.Random
import java.io.FileInputStream

val lines = new BufferedSource(new FileInputStream("markov.txt")) getLines
val wordregex = "\\b[a-zA-Z]+\\b|[.,?!]".r
val words = lines flatMap (wordregex findAllIn _) toArray
val rollingwords = words ++ words.slice(0, words.length / 2)
val rnd = new Random()
val outwords = new ArrayBuffer[String]()
for (i <- 1 to 1000) {
  val startposition = rnd nextInt (words.length * 2 / 3)
  val half = rollingwords slice (startposition, startposition + words.length / 3)
  var newword = ""; var n = 0; var index = 0
  while (index >= 0 && n < half.length && n < outwords.length) {
    index = half.indexOfSlice(outwords.takeRight(n))
    if (index >= 0 && index < half.length - n) {
      newword = half(index + n)
    }
    n = n + 1
  }
  outwords += newword
}
println(outwords.foldLeft("")(_ + " " + _))

Here is a sample output also generated from the wikipedia article on markov chains:

today stationary distributions will not be unique I probabilities satisfy k rightarrow position non the transition probability distribution can be represented mapping only if the parameters on the unit the system , Allowing n to be unique , in that i in the stationary distribution or invariant measure if it satisfies the stationary distribution for Q .

By the way, if you use "[a-zA-Z .,!?]".r as wordregex you can use this to generate letter based dissociated press as well:

This figurrent or periods when a backgrobability the Pater ext state with stochare a number of detelemely if theresting class of where Mi pimatransie, opens that the nnn need need by a system state is errords, then limpor all task.

It gets really interesting with a large text file like the Jargon file. Now letter based is already quite good:

Other direction algorithm will happily errors, and an uncommon; it's been shorthand for "out being proms, and meta-location hack with decades built around the LISP Mac pre-Internet access workstation. This may be dead. A measure of competitors, a popular compiler end repeatedly to second, and was leech. arranged with the encountered on the net, esp. from a network. Usually `customer and on the chad it back onto paper. Several had in the unique properties.

Wordbased becomes quite amusing:

This has since been reported . The only thing it expects one resource leak n . A semi - mythical language construct in an inconsistent because it can t adjust in the first place . If you enter a computer in a playful and ended Get a real computer ! imp . Sarcastic invitation to say Talking . Small cable were blamed for real programming . Pascal ten years later , but the majority of our product not quite the same modern subshell . There is some dispute over whether this entry everybody s mother .

Hans-Peter Störr

Posted 2011-02-16T03:35:07.423

Reputation: 249

@userunknown Oops, sorry - I fixed the script. – Hans-Peter Störr – 2012-02-09T15:58:51.097

1It is always nice to see the code ungolfed, but to conform to the rules, it is necessary, to golf your code (radically shorten identifiers, combine intermediate steps, ...). As an additional code block, preferably. – user unknown – 2011-09-27T01:38:51.897

There is not much point in that. Even by jumping through hoops I can't remotely compare to the code obfuscation level of the perl entry. :-) – Hans-Peter Störr – 2011-09-27T19:02:43.230

Well - if you don't like to reduce the size, maybe you like to increase the size, to contain the missing imports, so that one can at least test the program, whether it works, without guessing. – user unknown – 2011-09-27T19:35:40.083

2

Python 2.7, 355 chars

I've actually written a program like this before as an AI experiment, so let's just dissect it a little, remove some unnecessary things, and golf it :D

import re,random,sys
r=range
x=re.compile("([\w']+[\.?!,]?)+")
f=open(sys.argv[1])
c=f.read()
f.close()
t=x.findall(c)
m={}
for l in r(len(t)):
 w=[];c=t[l]
 for y in r(len(t)-1):
  if c==t[y]:w.append(str(t[y+1]))
 m[c]=w
x=random.choice(m.keys())
for i in r(int(sys.argv[2])):
 if len(m[x])==0:break
 y=random.choice(m[x]);print y,
 x=y

input works by providing a filename and the length of the output you want, in words

python disspress.py nevermore.txt 100

and nothing more! Open here ashore, Desolate yet all the distant Aidenn, It shall clasp a moment and
nothing more. Deep into the Night's Plutonian shore! Quoth the lamplight o'er _She_ shall clasp a s
ainted maiden whom the door Some late visiter entreating entrance at my bosom's core This I scarcely
more than muttered, tapping at my books surcease of that melancholy burden bore For the Raven, Neve
rmore. And the chamber door Bird or stayed he hath spoken! Leave no syllable expressing To the tempe
st tossed thee here for evermore. And each separate dying ember wrought its only stock and

sample text brought to you by a previous challenge

Optionally, you could save the contents of m to a file for later use, so it doesn't have to parse the entire file, as that could take longer periods of time to build the dictionary it references for the words especially for larger texts (like books).

edit: regardless if there was a winner chosen already, I'm posting it anyways :P

Blazer

Posted 2011-02-16T03:35:07.423

Reputation: 1 902

0

Perl, 65 chars

$/=$,;$_=<>;/./;($a.=$a[rand@a])=~/..$/while@a=/\Q$&\E(.)/g;say$a

This is heavily based on J B's answer, just golfed a little more. Uses say for a cheesy two-char saving, so needs to be run with Perl 5.10 or later and the -M5.010 (or -E) switch.

Running this code on the Wikipedia dissociated press article produced this lovely output:

is all lon eat afteditterelessam in. Thided Press (or pocut ents. Refeed 2007-04-12-29). Refeaturrand prefery the basto useassociatualgor 1972) in on. Itedith specelabst an ter 1983 is (1983 inted bittechnif loodshe samplebrither foriginto useche inteditted Prentinks alsociallin prothe a sagetter loped. This the nown on. Thissociated impastiot whe "Whe ing thm #176. It orociame orinks algon tencyclon. (2007-04-12-29). Ame Jarrassocumovin ano sain his ot on. Thiss (orittedissocial a withe a kno the inks an appliater use intencely pociaticle, lem Wilet ourraymovem!

Ilmari Karonen

Posted 2011-02-16T03:35:07.423

Reputation: 19 513