Naïve Markov Chain Word Generation

9

2

There are many ways to generate random words. You can take random syllables from a set, you can use n-tuples, probably neural networks (what can't they do?), alternating between consonants and vowels, etc. The method this challenge is based off of is by far the worst. It uses a Markov chain to generate random words. If you are familiar Markov chains, you probably know why this method is so terrible.

If you want to read about Markov chains, click here.

Your program will take an input one or more words and generate a single random word, through the method of a weighted Markov chain. Since that probably makes sense to no one but me, here is an explanation through the use of a picture of the Markov chain with the input of abba:

A Markov chain for <code>abba</code>

(All edge weights are the same for all pictures) Your program will output the path through a Markov chain based on the input text. As you can see, there is a 1/2 chance it will output a, 1/8 chance of aba, 1/16 chance of abba, 1/32 chance of ababa, etc.

Here are some other example Markov chains:

yabba dabba doo

enter image description here

wolfram

enter image description here

supercalifragilisticexpialidocious

enter image description here

If you want more examples, use this. (I worked way too hard on it)

Details of challenge:

  • Input can be taken as a list of strings, or as a space, comma, or newline separated string
  • You may assume that all the words will be entirely lowercase with no punctuation (ASCII 97-122)
  • You may write either a program or a function
  • To test, you could probably input the examples and see if all inputs line up with the Markov chains

This is , so your program is scored in bytes.

Let me know if any part of this is unclear, and I will try to make it make more sense.

DanTheMan

Posted 2016-03-17T04:46:30.603

Reputation: 3 140

It probably makes sense to quite a fre people, because Chatgoat and Marky are both weighted Markov chatbots IIRC. – ASCII-only – 2016-03-17T08:18:52.060

I don't understand the relationship between the input and those Markov chains. It sometimes appear to be impossible to produce the input word by using one path in the given chain (e.g. "yabba dabba doo". No self loop for b so you cannot produce a double b. Moreover once you reach a b it doesn't seem possible to return to the start to produce the other words). I believe you must clarify what the requirements are... – Bakuriu – 2016-03-17T13:29:42.380

@Bakuriu the error on the yabba dabba doo is an accident. I will fix it as soon as possible. As to the not being able to get back to start, you only generate one word from a given set of words. Does that clarify it? – DanTheMan – 2016-03-17T14:27:23.820

Answers

5

Pyth, 38 32 bytes

VQJK1FZacN1k XKH]Z=KZ;WJ=JO@HJpJ

Thanks to FryAmTheEggman for 5 bytes! To be honest I started writing Python answer when I noticed somebody posted a very similiar one so I decided to challenge myself with something new so I rewrote my answer (which was basically Pietu's answer) in Pyth.

Input is an array of strings ["Mary" , "had" , "a" , "little"]

Lause

Posted 2016-03-17T04:46:30.603

Reputation: 231

Nice first post, welcome to PPCG :) Some golf tips: F is only ever useful when the variable V would use gets overridden when you don't want it to, so you can change the first Fd to V and replace d with N elsewhere. [) around one element is the same as ]. Instead of adding to a list you can use append (a) to save the casting. More in general, I think you can probably make this shorter by taking a more functional approach. I'm also not sure what the +kJ is for, adding the empty string to a string should be a noop? – FryAmTheEggman – 2016-03-17T14:57:42.053

Thanks! I would love to take more functional approach sadly I am not well versed in functional stuff (lambda expressions are probably my closest experience). Thanks for the bytes by the way! – Lause – 2016-03-17T15:33:08.437

4

Python 2, 138 133 bytes

from random import*
M={}
for w in input():
 P=p=1
 for k in list(w)+[""]:M[p]=M.get(p,[])+[k];p=k
while P:P=choice(M[P]);k+=P
print k

Takes in an array of string such as ["yabba", "dabba", "doo"].

Example outputs with that input:

do
ya
dabbbbbbbaba
do
ya
yaba
da
dabba
yabbababbababbbbababa
do

I also want to highlight this result.

stidoupilioustialilisusupexpexpexpicexperagilidoupexpexpilicalidousupexpiocagililidocercagidoustilililisupialis

PurkkaKoodari

Posted 2016-03-17T04:46:30.603

Reputation: 16 699

2

Ruby, 112 107 101 99

Input is stdin, newline-seperated strings.

QPaysTaxes helped a lot with golfing it down!

M={}
while gets
k=''
$_.each_char{|c|M[k]||=[];M[k]<<c;k=c}
end
k=''
print k=M[k].sample while M[k]

Value Ink

Posted 2016-03-17T04:46:30.603

Reputation: 10 608

1I'd appreciate credit :D (Something like "thanks to QPaysTaxes for golfing help" or the like seems common around here) – Fund Monica's Lawsuit – 2016-03-17T21:06:30.750

1

Matlab, 160 bytes

Takes input as a cell array of strings, like {'string1','string2','string3'}.

s=input('');n=[];l=96;for i=1:numel(s);n=[n 96 double(s{i}) 123];end
while(l(end)<123);p=n(find(n==l(end))+1);l=[l p(randsample(nnz(p),1))];end
char(l(2:end-1))

This reads the words, and converts them to a vector of ASCII values, with a 96 to mark the start of a word, and a 123 to represent the end of a word. To construct a random word, start with a 96. The search for all the integers that follow 96 in the vector, and take a random sample from those to pick the next letter. Repeat this, looking for all the integers that follow the current one, until 123 is reached, which signals the end of the word. Convert it back to letters, and display.

The input {'yabba','dabba','doo'} produces results like da. Here are the results of ten runs: yabababbbababa,da,doo,doooooo,ya,da,doooo,ya,do,yaba.

David

Posted 2016-03-17T04:46:30.603

Reputation: 1 316