There is a sentence with many cats:

there is a cat house where many cats live. in the cat house, there is a cat called alice and a cat called bob. in this house where all cats live, a cat can be concatenated into a string of cats. The cat called alice likes to purr and the cat called bob likes to drink milk.

The Task

Concatenate (_) all pairs of neighbouring words in the sentence and place each in between the any such pair if that pair occurs more than once in the sentence. Note that overlapping counts, so blah blah occurs twice in blah blah blah.

For example, if the cat occurs more then once, add the concatenated words in between them like this: the the_cat cat

Example Output

there there_is is is_a a a_cat cat cat_house house where many cats live. in the the_cat cat cat_house house,
there there_is is is_a a a_cat cat cat_called called called_alice alice and a a_cat cat cat_called called called_bob bob.
the the_cat cat cat_called called called_alice alice likes likes_to to purr and the the_cat cat cat_called called called_bob bob likes likes_to to drink milk.

Some more examples:

milk milk milk       milk milk_milk milk milk_milk milk
a bun and a bunny    a bun and a bunny
milk milk milk.      milk milk milk.
bun bun bun bun.     bun bun_bun bun bun_bun bun bun.

Notes

All utf-8 characters are allowed. Meaning that punctuations are part of the input.
Punctuation becomes part of the word (e.g., with in the house, is a cat the word house, includes the comma)

Bob van Luijt

Posted 2019-09-01T19:08:55.767

Reputation: 137

Why not house house_where where, since house where appears twice? (Actually, I think the example output does not match the example input.) – Arnauld – 2019-09-01T19:38:59.000

well spotted, not on purpose – Bob van Luijt – 2019-09-01T20:04:11.397

What types of characters can the input have? Do we need to worry about punctuation for what counts as a word? – xnor – 2019-09-01T20:10:07.233

Good question @xnor, updated the question above – Bob van Luijt – 2019-09-01T20:15:37.637

3@BobvanLuijt So that clarifies what characters are allowed, but I'm still not clear how punctuation or other characters affect what's considered a separate word. – xnor – 2019-09-01T20:18:56.343

2So basically a word is a sequence of non-whitespace characters? – Arnauld – 2019-09-01T20:28:20.450

2Is the count overlapping or not? i.e does milk milk occur twice or once in milk milk milk? (I'd guess twice so "yes" but I don't know) – Jonathan Allan – 2019-09-01T20:36:18.833

Yes, correct. Suggestions on how to add this to the game rules are welcome :) – Bob van Luijt – 2019-09-01T20:38:44.953

RE: xnor & Arnald's comments: does milk milk still occur twice in milk milk milk. or not? (Note the trailing period). – Jonathan Allan – 2019-09-01T20:44:57.733

@JonathanAllan so should the output of milk milk milk be milk milk_milk milk milk_milk milk? – Nick Kennedy – 2019-09-01T20:53:19.897

1Perfect; thanks – Bob van Luijt – 2019-09-01T20:59:57.797

@JonathanAllan I think the second note means that milk milk milk. would be left alone, and milk milk milk milk. would become milk milk_milk milk milk_milk milk milk. – Nick Kennedy – 2019-09-01T21:06:02.467

@Nick agreed. I've added milk milk milk. and bun bun bun bun. as test cases and nominated for re-opening. – Jonathan Allan – 2019-09-02T11:13:38.193

Thanks @JonathanAllan. Did you remove your answer btw? – Bob van Luijt – 2019-09-02T11:14:17.233

An example like this should produce this_should -> this should produce this_should is probably worth adding (since a filtering approach might yield this this_should should produce this_should by mistake). – Jonathan Allan – 2019-09-02T12:41:53.603

Is cat cat_house house, a mistake, since the second word includes the comma? – Neil – 2019-09-02T23:46:09.730

Answers

Python 2, 99 107 103 102 bytes

def f(s):S=s.split();T=zip(S,S[1:]+[s]);return' '.join(x+(' '+x+'_'+y)*(T.count((x,y))>1)for x,y in T)

Try it online!

Fixed the milk milk milk style edge cases.

Chas Brown

Posted 2019-09-01T19:08:55.767

Reputation: 8 959

...but milk milk milk - count is non-overlapping; is that OK?

– Jonathan Allan – 2019-09-01T20:33:23.357

@Jonathan Allan: Hmm... OP has yet to weigh in, but I don't like that milk milk milk doesn't work with this code... thinking... – Chas Brown – 2019-09-01T20:48:09.153

Also be incorrect with compound words, like a bun and a bunny though.

– Jonathan Allan – 2019-09-01T20:51:37.123

Okay, fixed now at a cost of 8 bytes... – Chas Brown – 2019-09-01T20:52:49.523

Jelly, 23 22 21 18 bytes

Ḳµżj”_$ƝḢKċ@Ị¥?€`K

A full program which prints the output.

Try it online! Or see a test-suite.

How?

Ḳµżj”_$ƝḢKċ@Ị¥?€`K - Main Link: list of characters T
Ḳ                  - split (T) at spaces (call this W)
 µ                 - start a new monadic chain (i.e. f(W))
       Ɲ           - for neighbouring pairs:
      $            -   last two links as a monad:
   j               -     join with...
    ”_             -     ...underscore character
  ż                - zip (with W) (making pairs of ["left", "left_right"]
                   -               plus a trailing ["rightmost"])
                `  - use this as both left and right arguments of:
               €   -   for each:
              ?    -     if...
             ¥     -     ...condition: last two links as a dyad:
           @       -       with swapped arguments:
          ċ        -         count occurrences of right in left
            Ị      -       is insignificant? (abs(x) <= 1)
        Ḣ          -     ...then: head (   ["left", "left_right"] -> "left"
                   -                    or ["rightmost"] -> "rightmost")
         K         -     ...else: join with space (["same", "same_same"] -> "same same_same")
                 K - join with space characters
                   - implicit print

Jonathan Allan

Posted 2019-09-01T19:08:55.767

Reputation: 67 804

Zsh, 103 bytes

t=($=1)
for b a (${${t:1}:^t})s+=(${a}_$b)
for w ($s)((++i,${#s:#$w}-$#s+1))||s[i]=
echo ${t:^s} $t[-1]

Try it online!

The key constructs used here are:

${=1}: splits the first parameter into words (because there are no other flags, the {braces} are optional)
${a:^b}: substitutes array $a zipped with array $b
${a:#b}: substitutes array $a with all instances of b removed.
${# }: the length of the contained expansion.
echo: <<< leaves extra spaces

If the input is already split into words, 99 bytes.

GammaFunction

Posted 2019-09-01T19:08:55.767

Reputation: 2 838

Jelly, 32 27 bytes

;⁶Ḳṡ2W€jj”_W$Ɗ€ĠL’$ƇẎƊ¦Ṗ€ẎK

Try it online!

Shorter and more correct! Thanks to @JonathanAllan for highlighting an issue with "milk milk milk"!

A monadic link that takes a Jelly string as its argument and returns a processed Jelly string.

Explanation

;⁶                          | Append a space
  Ḳ                         | Split at spaces
   ṡ2                       | Sliced of length 2
                     Ɗ¦     | At the indices indicated by the following:
               Ġ            | - Group indices of equal values
                   Ƈ        | - Keep only those where the following is non-zero:
                L           |   - Length
                 ’          |   - Decrease by 1
                    Ẏ       | - Tighten (join outermost lists together)
             Ɗ€             | Do the following as a monad:
     W€                     | - Wrap each word in a list
       j    $               | - Join with the following:
        j”_                 |   - The two words joined with "_"
           W                |   - Wrapped in a list
                       Ṗ€   | Remove last member of each list
                         Ẏ  | Tighten (join outermost lists)
                          K | Join with spaces

Nick Kennedy

Posted 2019-09-01T19:08:55.767

Reputation: 11 829

JavaScript (ES6), 94 bytes

s=>s.split` `.map((w,i,a)=>a.some((W,j)=>j!=i&W==w&a[j+1]==(p=a[i+1]))?w+` ${w}_`+p:w).join` `

Try it online!

Arnauld

Posted 2019-09-01T19:08:55.767

Reputation: 111 334

Thanks @Arnauld, would you mind explaining a bit more in-depth about how it works? – Bob van Luijt – 2019-09-01T20:23:20.733

...also milk milk milk... I've asked under the question for clarification.

– Jonathan Allan – 2019-09-01T20:39:05.663

@JonathanAllan Thanks for pointing this out. Now fixed. – Arnauld – 2019-09-02T20:21:06.487

J, 69 bytes

;:inv@(a:-.~],@,.a:,~(2(,'_'&,)&.>/\])#&.>~1<1#.[:=/~2<\])' '<;._1@,]

Try it online!

J, 56 bytes (but breaks on commas)

(a:-.~],@,.a:,~(2(,'_'&,)&.>/\])#&.>~1<1#.[:=/~2<\])&.;:

Try it online!

explanation for both

A bit verbose but the underlying idea is nice, so I'll explain that with pictures:

Let's start with this input:

low in xx xx low in bun bun bun bun.

First we turn it into words:

┌───┬──┬──┬──┬───┬──┬───┬───┬───┬────┐
│low│in│xx│xx│low│in│bun│bun│bun│bun.│
└───┴──┴──┴──┴───┴──┴───┴───┴───┴────┘

And then create the underscore concatenation of every pair, plus a blank item at the end:

┌──────┬─────┬─────┬──────┬──────┬──────┬───────┬───────┬────────┬┐
│low_in│in_xx│xx_xx│xx_low│low_in│in_bun│bun_bun│bun_bun│bun_bun.││
└──────┴─────┴─────┴──────┴──────┴──────┴───────┴───────┴────────┴┘

Let's zip these together and see where we're at:

┌────┬────────┐
│low │low_in  │
├────┼────────┤
│in  │in_xx   │
├────┼────────┤
│xx  │xx_xx   │
├────┼────────┤
│xx  │xx_low  │
├────┼────────┤
│low │low_in  │
├────┼────────┤
│in  │in_bun  │
├────┼────────┤
│bun │bun_bun │
├────┼────────┤
│bun │bun_bun │
├────┼────────┤
│bun │bun_bun.│
├────┼────────┤
│bun.│        │
└────┴────────┘

We notice that if we could keep just the items we want in the right column, we could flatten the whole thing, unbox, and we'd be done.

So we want a filter for the right column. Let's start by treating the consecutive pairs of input words as single units (again, with a blank at the end):

┌────────┬───────┬───────┬────────┬────────┬────────┬─────────┬─────────┬──────────┬┐
│┌───┬──┐│┌──┬──┐│┌──┬──┐│┌──┬───┐│┌───┬──┐│┌──┬───┐│┌───┬───┐│┌───┬───┐│┌───┬────┐││
││low│in│││in│xx│││xx│xx│││xx│low│││low│in│││in│bun│││bun│bun│││bun│bun│││bun│bun.│││
│└───┴──┘│└──┴──┘│└──┴──┘│└──┴───┘│└───┴──┘│└──┴───┘│└───┴───┘│└───┴───┘│└───┴────┘││
└────────┴───────┴───────┴────────┴────────┴────────┴─────────┴─────────┴──────────┴┘

The filter we seek is simply any element that occurs more than once. To find this we'll create a function table of equality:

1 0 0 0 1 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0
0 0 1 0 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0 0
1 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 1

And sum it rowise or colwise (the direction doesn't matter, since it's symmetric):

2 1 1 1 2 1 2 2 1 1

And find all entries greater than 1:

1 0 0 0 1 0 1 1 0 0

This filter is all we need to carry out our plan from above and arrive at the answer.

Jonah

Posted 2019-09-01T19:08:55.767

Reputation: 8 729

Concatenate the Cats

The Task

Example Output

Notes

Answers

Python 2, 99 107 103 102 bytes

Jelly, 23 22 21 18 bytes

How?

Zsh, 103 bytes

Jelly, 32 27 bytes

Explanation

JavaScript (ES6), 94 bytes

J, 69 bytes

J, 56 bytes (but breaks on commas)

explanation for both