Stream of Letters to Words

8

1

Given a string containing only letters (case-insensitive), split it into words of uniformly random lengths, using the distribution below, with the exception of the last word, which can be of any valid length (1-10). Your output is these words, as a space-separated string ("test te tests"), an array of strings (["test","te","tests"]), or any other similar output format.

Word Length Distribution

Word Length - Fractional Chance / 72 - Rounded Percentage
1 - 2 / 72 - 2.78%
2 - 14 / 72 - 19.44%
3 - 16 / 72 - 22.22%
4 - 12 / 72 - 16.67%
5 - 8 / 72 - 11.11%
6 - 6 / 72 - 8.33%
7 - 5 / 72 - 6.94%
8 - 4 / 72 - 5.56%
9 - 3 / 72 - 4.17%
10 - 2 / 72 - 2.78%

Your odds do not need to match exactly - they can be off by 1/144th, or .69%, in either direction (but obviously they still must sum up to 72/72 or 100%).

Data roughly guessed from the fourth page, first figure of this paper.

Test Cases with Sample Output

Behavior on very short (length < 11) test cases is undefined.

Note that I created these by hand, so they may or may not follow the uniform distribution above.

abcdefghijklmnopqrstuvwxyz
abcd efgh i jklmnopq rs tu vwx yz

thequickbrownfoxjumpedoverthelazydog
t heq uick brown fo xj ump edo vert helazydog

ascuyoiuawerknbadhcviuahsiduferbfalskdjhvlkcjhaiusdyfajsefbksdbfkalsjcuyasjehflkjhfalksdblhsgdfasudyfekjfalksdjfhlkasefyuiaydskfjashdflkasdhfksd
asc uyoi uawer k nb a dhcviua hsid ufe r bfa lskd jhv lkcj haius dy faj se fbks dbfkals jcuyasjehf lkjh falk sd blhsgdf asudyfekjf alk sdjfhlk asefyu iaydskfja shdflk as dhf ksd

This is , so shortest answer in bytes wins.

Stephen

Posted 2017-07-22T19:28:50.933

Reputation: 12 293

Sandbox – Stephen – 2017-07-22T19:29:07.363

Can the last word be an empty string? – rahnema1 – 2017-07-23T02:25:15.263

@rahnema1 you mean in an array output? – Stephen – 2017-07-23T02:27:56.103

Yes, last element of the array output. – rahnema1 – 2017-07-23T02:28:53.693

@rahnema1 sure, since if you join the array on space you'd just have a trailing space, which I would allow. – Stephen – 2017-07-23T02:29:33.747

Answers

5

Jelly, 28 bytes

“¤Œæ׿®¬©¥¤‘Jẋ"$FẋLẊ‘1¦+\Ṭœṗ

A monadic link taking a list and returning a list of lists.

Try it online! (the footer separates the resulting list of lists with spaces)

How?

Uses all the percentages in the distribution rounded to their nearest integer (thus being within the 0.69% allowed thresholds).

“¤Œæ׿®¬©¥¤‘Jẋ"$FẋLẊ‘1¦+\Ṭœṗ - Link: list (of characters), s
“¤Œæ׿®¬©¥¤‘                 - code page indexes = [3,19,22,17,11,8,7,6,4,3]
               $             - last two links as a monad:
            J                -   range of length = [1, 2, 3, 4, 5,6,7,8,9,10]
              "              -   zip with:
             ẋ               -     repeat list = [[1,1,1],...,[9,9,9,9],[10,10,10]]
                F            - flatten (into one list of length 100)
                  L          - length of s
                 ẋ           - repeat list (one list of length 100*length(s) with the
                             -              correct distribution of integer lengths)
                   Ẋ         - shuffle
                      ¦      - sparse application of:
                    ‘        -   increment
                     1       -   to indexes: 1 (offset indexes for the partition below)
                        \    - cumulative reduce by:
                       +     -   addition (e.g. [4,4,7,1,...] -> [4,8,15,16,...])
                         Ṭ   - untruth (yield a list with 1s at those indexes (1 indexed)
                          œṗ - partition s at truthy indexes (note: excess ignored)

Jonathan Allan

Posted 2017-07-22T19:28:50.933

Reputation: 67 804

Your script may theoretically output the list whose first element is empty string. Not sure if it is allowed by the question. – None – 2017-07-22T20:16:47.363

@ThePirateBay yeah I actually just noticed this myself - I need to add 1 to the first element of my cumulative lengths; will update shortly to fix. – Jonathan Allan – 2017-07-22T20:17:51.990

Fixed, and explanation added (can't ask to allow the leading space as the 1st word length would actually be using the wrong distribution). – Jonathan Allan – 2017-07-22T20:27:44.883

4

PHP, 94 bytes

for(;$c=$argn[$k++];print$c." "[--$e])if(!$e)for($r=rand(0,71);$r>ord(x^"ywgSKAF:=?"[$e++]););

Run as pipe with -nR or try it online.

breakdown

for(;$c=$argn[$i++];            # loop $c through string
    print$c                         # 2. print current character,
        ." "                        # 4. if remaining length is 0, print space
        [--$e]                      # 3. decrement remaining length
    )
    if(!$e)                         # 1. if remaining length is 0,
        for($r=rand(0,71);              # get random value from 0 to 71
            $r>ord(x^"ywgSKAF:=?"[$e++])    # and increment $e while $r is > probability
        ;);

Note: ywgSKAF:=? represents the increasing probabities -1: [1,15,31,43,51,57,62,66,69,71]

Titus

Posted 2017-07-22T19:28:50.933

Reputation: 13 814

I wonder: Would the probabilities change if I´d call rand() for every comparison? If not, I could save 5 bytes. – Titus – 2017-07-22T23:02:22.250

1They would certainly change, e.g. the probability of length 2 would be 70/72*16/72, which is higher than 14/72. – Ørjan Johansen – 2017-07-22T23:14:17.913

@ØrjanJohansen Something told me that would happen. Pity: That way I´d waste at least 5 bytes. – Titus – 2017-07-23T00:44:38.130

This is not the first time I've seen a supposedly complete PHP solution that didn't start with <? or <?php. Run as a standalone program, PHP needs that. Or am I missing something? – manassehkatz-Moving 2 Codidact – 2017-07-27T02:54:04.147

@manassehkatz You are missing Run as pipe with -nR. I.e. this code is supposed to be a parameter to php, not a file. Use echo <input> | php -nR '<code>' on the command line to execute. – Titus – 2017-07-27T03:46:57.690

@Titus Aha! I checked and the other 2 where I commented on this did NOT have -nR mentioned. I saw a rule that command line otpions should count as extra bytes, but you show the count as 94, which is the actual code length. Is there an exemption for this? – manassehkatz-Moving 2 Codidact – 2017-07-27T04:08:51.040

@manassehkatz https://codegolf.meta.stackexchange.com/questions/2424/running-php-with-r-instead-of-code-tags Hmm seems I´d have to update a lot of byte counts. Will consider it for future posts.

– Titus – 2017-07-27T06:32:03.273

The reality is that, unfortunately for those of us who love PHP, PHP rarely comes close to lowest on Code Golf. But I see the extra byte(s) for command line or compiler options included with other languages and "fair is fair". Arguably when a PHP entry is set up as a function() you could say that is the definition of a function (as opposed to a full program) and then the extra characters would not be needed, but most of the time a full program ends up shorter. – manassehkatz-Moving 2 Codidact – 2017-07-27T14:05:34.600

3

Octave, 108 bytes

@(s)mat2cell(s,1,[k=(w=repelems(1:10,[1:10;'AMOKGEDCBA'-63])(randi(72,1,n=nnz(s))))(cumsum(w)<=n) n-sum(k)])

Try it online!

*Takes the string as input and outputs an array of strings.

*The last element of the output may be an empty string.

rahnema1

Posted 2017-07-22T19:28:50.933

Reputation: 5 435

3

Python 2, 154 150 147 145 bytes

Allright, this is my first attempt on code golf. Straight up with the code:

import numpy.random as r
def f(s):
 i=0
 while i<len(s):
    i+=r.choice(11,p=[x/72. for x in [0,2,14,16,12,8,6,5,4,3,2]])
    s=s[:i]+' '+s[i:]
    i+=1
 return s

The second indent is by a tab char as you can see in my TIO version: Try it Online.

What I do is adding a space in the string according to the given distribution. I veryfied my distribution by using:

import collections
dist = r.choice(11,100000,p=[x/72. for x in [0,2,14,16,12,8,6,5,4,3,2]])
print collections.Counter(dist)

Which gave me:

Word Length - Rounded Percentage as asked - Rounded Percentage as counted
1 - 2.78% - 2.794%
2 - 19.44% - 19.055%
3 - 22.22% - 22.376%
4 - 16.67% - 16.638%
5 - 11.11% - 11.246%
6 - 8.33% - 8.362%
7 - 6.94% - 7.063%
8 - 5.56% - 5.533%
9 - 4.17% - 4.153%
10 - 2.78% - 2.780%

Which I think is correct enough. I then repeat that process of adding a space until the length of my string is succeeded. I also increment my position index by one after adding a space. I hope someone can help me golf this line out but I did not see how to get it out without falsifying the first space.

As I see my text I recognice that I have to learn alot about this Site. Could someone link me a guide how to use the Stackoverflow answer function in the comments so I can learn for my next posts.


Edit:

Apperently while rereading my post I did figure out a way to get rid of the i+=1. So i saved 4 bytes by doing that. The new code looks like this:

import numpy.random as r
def f(s):
 i=-1
 while i<len(s):
  i+=r.choice(11,p=[x/72. for x in[0,2,14,16,12,8,6,5,4,3,2]])+1
  s=s[:i]+' '+s[i:]
 return s

Try it online!


Edit:

I figured out that i can remove some linebreaks.

import numpy.random as r
def f(s):
 i=-1
 while i<len(s):i+=r.choice(11,p=[x/72. for x in[0,2,14,16,12,8,6,5,4,3,2]])+1;s=s[:i]+' '+s[i:]
 return s


Edit: I modyfied my import and placed the definition of i inside the function.
from numpy.random import*
def f(s,i=-1):
 while i<len(s):i+=choice(11,p=[x/72. for x in[0,2,14,16,12,8,6,5,4,3,2]])+1;s=s[:i]+' '+s[i:]
 return s

Try it online!

Simon

Posted 2017-07-22T19:28:50.933

Reputation: 111

Welcome on Programming Puzzles & Code Golf! Nice first post! I've edited your answer a bit to improve the formatting (title, syntax highlight), you can have a look at what changed to see the commands for your future posts. (I don't have any link to give you sadly, but hopefully someone else will) – Dada – 2017-07-27T07:18:30.583

Thanks for the welcome and formatting my text. This helped me out and I did already use it in my Edit. – Simon – 2017-07-27T07:27:31.397

2

Dyalog APL, 90 bytes

{k←⍵⋄{' '~⍨(2⊃⍵)↓k↑⍨⊃⍵}¨(↓(o[1;2]),0),↓o←1↓⍉2(1-⍨≢⍵)⍴+\((2 14 16 12 8,⌽1+⍳5)\⍳10)[72?⍨≢⍵]}

Try it online! Hit Run a few times to see how it changes.

How?

72?⍨≢⍵ - roll 72 sided dice length of input times

[...] - index inside

(2 14 16 12 8,⌽1+⍳5)\⍳10 - expand range of 10 by 2 14 16 12 8 6 5 4 3 2 (to create weighted random)

+\ - cummulative sum

⍉2(1-⍨≢⍵)⍴ - shape as a zipped table x y zz x, x y, y z

o←1↓ - drop first element

(↓(o[1;2]),0),↓o - encase with its first coordinate paired with 0

¨ - for each pair (x, y)

(2⊃⍵)↓k↑⍨⊃⍵ - take input from index x to y

' '~⍨ - and remove spaces

Uriel

Posted 2017-07-22T19:28:50.933

Reputation: 11 708

2

Python 2, 155 bytes

from random import*
def f(s):
 i=sum([[i+1]*[2,14,16,12,8,6,5,4,3,2][i]for i in range(10)],[])[randint(0,71)]
 return s if len(s)<11else s[:i]+' '+f(s[i:])

Try it online!

Chas Brown

Posted 2017-07-22T19:28:50.933

Reputation: 8 959

2

Mathematica, 164 bytes

(s=Length[c=Characters@#];t=0;l={};While[t<s,If[t+(r=RandomChoice[{2,14,16,12,8,6,5,4,3,2}->Range@10])<=s,l~AppendTo~r];t=Tr@l];""<>#&/@FoldPairList[TakeDrop,c,l])&


takes a string as input
outputs array of strings

J42161217

Posted 2017-07-22T19:28:50.933

Reputation: 15 931

2

Charcoal, 43 39 bytes

FθF⁺¹I‽⪫Eχ×Iκ⌕᧔v↷£o8″!”κω⊞υ⎇κω Fθ⁺ι⊟υ

Try it online! Link is to verbose version of code. Outputs a trailing space if the last word was the exact size randomly chosen.

Neil

Posted 2017-07-22T19:28:50.933

Reputation: 95 035

Yeah it does seem too long :/ Any idea for new features that would shorten it? – ASCII-only – 2017-07-26T01:56:48.533

@ASCII-only I can't think of a good way of generating the probability table, which is already over 50% of the code. It would be nice if I didn't have to work around Assign(Slice(q, i), q); not working though. – Neil – 2017-07-26T08:17:11.633

That would work, just it's ambiguous, not entirely sure of the best way to fix that, sorry – ASCII-only – 2017-07-26T21:54:16.643

Woah so much nesting – ASCII-only – 2017-07-27T00:18:27.297

@ASCII-only Who cares if it saves bytes? – Neil – 2017-07-27T00:31:14.010

2

Perl 5, 107 bytes

@B=((1,10,9,5,2)x2,(2,3,4)x12,(5,6)x6,7,(3,7,8)x4,9);while(10<length){$i=$B[rand 72];s/.{$i}//;print"$& "}

Try it online!

106 bytes of code +1 for -p

Xcali

Posted 2017-07-22T19:28:50.933

Reputation: 7 671

Welcome on the site! Nice to see a new Perl golfer, great answer! (you can still save a few bytes though). Also, I recommend adding some explanations, so non-Perl people have an idea of how you did it (and even for Perl coder, it saves a bit of time), but that's not mandatory and just up to you.

– Dada – 2017-07-27T08:35:51.227

1

Ruby, 96+1 = 97 bytes

Uses the -p flag.

i=0
m=[3,19,22,17,11,8,7,6,4,3].flat_map{|k|[i+=1]*k}
i=0
$_[i-1,0]=' 'while~/$/+1>i+=1+m.sample

Try it online!

Value Ink

Posted 2017-07-22T19:28:50.933

Reputation: 10 608

1

><>, 168 152 bytes

<v?(0:i
~/<r
/x\/oo \
012\oo~\!/ !ox\
\v/:1=?^?\ooo>>
 x^
v\:1\/>> o
|/o\=\x^!/
voo/? >x
v\o~/ v_
>" "\?\x>xoo\
^oooooo<<< \
^.22/ \>x!/xv
^ooooooo_o<<<

Try it online, or watch it at the fish playground!

Randomness is tricky in ><>: there's only one random instruction, x, which sets the fish's direction to either up, down, left or right. This is a complicated program, so here's a colour-coded diagram to help you:

Colour-coded code!

I tried to split up the probabilities into chunks so that the probabilities within and between the chunks were fairly simple (preferring, say, 1/3 to 25/72). I did this as follows:

Tree of probabilities

The fish starts at the grey bit of the code (X). This is fairly standard ><> code to read in all of the input. It gets more interesting, so let's move on.

Next, the fish comes to the light and dark green sections (Y). You may notice from the probability tree that the three major branches each sum to 1/3, and that each of these branches splits into a 2/3 sub-branch and a 1/3 sub-branch. The green sections of code cover these two levels of the tree. First, we pick a random number out of 0, 1, 2 with equal chance of each, in the top lobe of the light green bit. We can simulate a 1/3 chance using the four-way instruction x by cutting off one of the exits so that it just redirects the fish back to the x — then there are only three escape routes from the x, and by symmetry they have equal probabilities.

The next x, a little below this one, sends the fish to the ^ next to it with 2/3 chance — note that the fish wraps around if it swims left from the x — and down to a \ with 1/3 chance. The fish then swims along one of the two tails of the light green section. These tails are functionally the same: each checks if we pushed 0, 1 or 2 earlier, and branches out accordingly. And this completes the first two levels of the tree.

The next six sections (AF), in essence, use more xs to branch the fish further, and then use some number of os to print a number of letters from the input. These sections range from straightforward (e.g. dark blue, C, which just prints three letters) to, well, not so straightforward (e.g. orange, D, which needs two xs to simulate a 3/8–5/8 split, printing letters in multiple stages). The details of these are left as an exercise. (I'm particularly pleased with yellow, E, which sends the fish in a loop-the-loop!)

After each of these branches, the fish ultimately reaches the pink section (Z). This draws all the branches back together, prints a space, then finally makes the fish jump to position (2,2) in the grid and start again at the first x.


In case the "it's complicated" explanation above doesn't convince you that this gives the correct probabilities, I also tested this on a length 65,000 input string (64 KiB, only 13 seconds in TIO!), and the resulting distribution of word lengths was

{{1,0.027377},{2,0.191237},{3,0.226599},{4,0.164128},{5,0.113064},{6,0.0818627},{7,0.0703885},{8,0.0543515},{9,0.0426089},{10,0.0283835}}

These probabilities are at most 0.0044 away from the expected probabilities.

Not a tree

Posted 2017-07-22T19:28:50.933

Reputation: 3 106