Write a program that goes through a string of non-whitespace characters (you may assume that they are digits 0 to 9, but nothing in the way they are to be processed depends on this) and adds spaces according to the following rules.

Let the current token be the empty string, and the previously emitted tokens be an empty set.
Iterate through the characters of the string. For each character, first append the character to the current token. Then if the current token is not already in the set of previously emitted tokens, add the current token to that set and let the new current token be the empty string.
If when you reach the end of the string the current token is empty, output the previously emitted tokens in order of emission, separated by a space character. Otherwise output the original string verbatim.

Input

The input to the STDIN should be a sequence of digits.

Output

The program should print the result as specified in step 3.

Samples

Sample inputs

2015
10101010
4815162342
101010101010
3455121372425
123456789101112131415
314159265358979323846264338327950288419716939937

Sample outputs

2 0 1 5
10101010
4 8 1 5 16 2 3 42
1 0 10 101 01 010
3 4 5 51 2 1 37 24 25
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3 1 4 15 9 2 6 5 35 8 97 93 23 84 62 64 33 83 27 95 0 28 841 971 69 39 937

This is code golf, so standard CG rules apply. Shortest program in bytes wins.

(Please request any clarifications in the comments. I'm still new to this. Thanks!)

Arcturus

Posted 2015-09-15T12:29:31.317

Reputation: 6 537

104815162342 I see what you did there, brotha. – Fatalize – 2015-09-15T12:36:40.447

I suggest to get rid of the “repeat at the end of the number” part of rule 7. (It confused me too.) Maybe you could say, there are not enough digits to form a distinct number. – manatwork – 2015-09-15T13:18:19.623

Probable error in the spec: step 6 asks us to repeat steps 3 and 4, so the only number in the output which can be more than one digit long is the second one. If that were to be corrected simply by changing it to "Repeat steps 4 and 5 ...", the movement in step 5 would mean that each number in the output should have an odd number of digits. And numbers which have leading zeroes are equal to the number without the leading zeroes, so if you want to distinguish 01 from 1 you should stop talking about numbers altogether and talk about strings instead. – Peter Taylor – 2015-09-15T13:37:59.513

A test case which my code originally failed despite passing all the ones in the challenge: 11312123133 – Martin Ender – 2015-09-15T13:41:20.613

16Proposed OEIS entry: numbers which are split into at least two segments by this process. – Martin Ender – 2015-09-15T14:05:20.313

@Sok Notice that 10101010 can result in 1 0 10 1010, which fits the rules. – Ismael Miguel – 2015-09-15T14:50:29.883

3@IsmaelMiguel Step 5 (as any other step) can only advance one digit at a time. Once you’ve got 1 0 10, the next iteration will find 1 (already used), then advance one to find 10 (already used), then advance one to find 101, which is new and would be ‘added’. It would then add a space and you'd get to a new 0, which has already been used, but is here at the end of the string. Therefore, the output would be 1 0 10 101 0, which is invalid (0 is repeated), and the script must then just output the input string. It could only make 1010 if 101 had already been used. – Janus Bahs Jacquet – 2015-09-15T15:32:21.700

You list 10101010 as an output. That looks like a mistake. – kasperd – 2015-09-15T16:40:20.503

3@kasperd No If a unique number cannot be formed at the end of the string, then the input should be printed verbatim 10101010 cannot be split so it is printed as is. – edc65 – 2015-09-15T17:05:02.787

@edc65 I see. I had misunderstood that rule. – kasperd – 2015-09-15T17:07:52.197

@JanusBahsJacquet, no. Given input 10101010 steps 1 to 3 give 1 ^0101010 (with ^ to indicate the position of the cursor); then step 4 gives 1 0^101010, step 5 gives 1 0 ^101010; step 4 gives 1 0 1^01010; step 5 gives 1 0 10^1010; step 4 gives 1 0 101^010; step 5 gives 1 0 101 ^010; step 4 gives 1 0 101 0^10; step 5 gives 1 0 101 01^0; step 4 gives 1 0 101 010^; step 5 gives 1 0 101 010; and the loop breaks. So the specified output for that input is 1 0 101 010 – Peter Taylor – 2015-09-15T20:34:17.243

@PeterTaylor No. 10 has not been used yet at 1 0 10^1010, so it wouldn't continue to 1 0 101^010, but add a space, making 1 0 10 ^1010. Moving on to 101 doesn't happen until the next ‘group of 10’ when it reaches 1 0 10 101^0. – Janus Bahs Jacquet – 2015-09-15T20:38:03.713

@JanusBahsJacquet, step 5 doesn't have a self-loop. It either inserts a space or moves one right. Then you go back to step 4. That's why I pointed out in my first comment that the spec requires each generated part to have an odd number of digits. – Peter Taylor – 2015-09-15T20:39:18.350

@PeterTaylor Exactly. It adds a space if doing so creates a number which has not already been used. At the relevant position, 10 has not yet been used, so step 5 adds a space and does not move to the right. I don't see how that would imply that the output must be an odd number of digits. – Janus Bahs Jacquet – 2015-09-15T20:42:17.077

1But when you enter step 5, the space would be after the 1, which would be a repeat. So instead you move right one in space 5, and then you move right one again in step 4, and you enter step 5 again and create 101. – Peter Taylor – 2015-09-15T20:43:50.450

@PeterTaylor Aaah, yes. I see what you mean now. Really, step 4 should be done away with entirely, and step 5 should be the only repeated one. Well spotted! (Really, steps 2 through 4 can all be scrapped, since the first digit will always be unused the first time it's reached.) – Janus Bahs Jacquet – 2015-09-15T20:48:53.853

@PeterTaylor Looks like you beat me to the editing. Thanks! – Arcturus – 2015-09-15T22:40:19.600

Answers

Pyth, 22 bytes

 faW!}=+kTYY~kdz?tkzsY

Leading space is important.

orlp

Posted 2015-09-15T12:29:31.317

Reputation: 37 067

Retina, 68 61 bytes

+`\b(\w+?)(?<!\b\1 .*)(\w+)$
$1 $2
(?=.* (.+)$(?<=\b\1 .+)) 
<empty>

<empty> is an empty line. Note the trailing space on line 3. You can run the above code from a single file with the -s flag.

Explanation

+`\b(\w+?)(?<!\b\1 .*)(\w+)$
$1 $2

This first step implements rules 1 to 6. It is a regex substitution which is applied repeatedly until the string stops changing (that's what the + is for). In each step we add a single space into the string from left to right (following the rules of the challenge). The regex matches the shortest string of digits which hasn't appeared in the already processed part of the string. We ensure that we're looking at a prefix of the remaining string with the word boundary \b and checking that we can reach the end of the string without passing spaces with (\w+)$. The latter also ensures that we only perform one replacement per step.

(?=.* (.+)$(?<=\b\1 .+)) 
<empty>

This matches any space (which is at the end of the regex), provided that the last segment of the string is the same as any other segment in the string, and replaces them with the empty string. That is, we undo the first step if it resulted in an invalid final segment, implementing rule 7.

Martin Ender

Posted 2015-09-15T12:29:31.317

Reputation: 184 808

Pyth, 24 23 bytes

VzI!}=+kNYaY~k"";?kzjdY

Try it out here.

VzI!}=+kNYaY~k"";?kzjdY    Implicit: z=input(), k='', Y=[], d=' '
Vz              ;          For N in z:
     =+kN                    Append N to k
  I!}    Y                   Is the above not in Y?
          aY k               Append k to Y
            ~k""             After append, reset k to ''
                 ?k        Is k truthy (i.e. not '')
                   z       Print original input
                    jdY    Otherwise print Y joined on spaces

Thanks to @FryAmTheEggman for saving a byte :o)

Sok

Posted 2015-09-15T12:29:31.317

Reputation: 5 592

@FryAmTheEggman A good call, I got pretty caught up in trying to preserve k's original value – Sok – 2015-09-15T14:59:46.550

Python 3, 92 bytes

i,n,*o=input(),""
for c in i:n+=c;o,n=[o+[n],o,"",n][n in o::2]
print([" ".join(o),i][n>""])

Basically a heavily golfed version of @Willem's solution.

orlp

Posted 2015-09-15T12:29:31.317

Reputation: 37 067

[" ".join(o),i][n>""] – FryAmTheEggman – 2015-09-15T14:55:12.883

@FryAmTheEggman Ah cool, I had tried that with bool(n) but I didn't think of n>"". – orlp – 2015-09-15T14:56:15.500

Python 3, 100 99 bytes

o=[];n="";i=input()
for c in i:
 n+=c
 if not n in o:o.append(n);n=""
print(i if n else" ".join(o))

Willem

Posted 2015-09-15T12:29:31.317

Reputation: 211

2I fixed your byte count. Also, you should remove the space from else ". – mbomb007 – 2015-09-15T14:37:27.607

1Some common golfs Also, your original score was 100 bytes, I think. – FryAmTheEggman – 2015-09-15T14:37:51.170

Cool thanks! I didn't know that the space after "else" could be removed. Another day lived, another day learned :) – Willem – 2015-09-15T14:49:22.080

Brachylog, 91 bytes

:_:_{h:0<|bhN,?hh:NrcH,?hB(l1,-1=A;BbA),?rhL:I(mH:Ar:[L]c:1&;:[H]:\"~w \"w,L:[H]c:_:Ar:1&)}

This made me realize that there's a lot of things about the syntax I need to change...

Explanation

:_:_              § Creates a list [Input,[],[]] 
{...}             § Define a new predicate between the brackets and call it with the previous list as input
h:0<              § If the head of the input is negative, stop
|                 § Else
bhN,              § Second element of Input is called N
?hh:NrcH,         § N concatenated with the first element of Input is H
?hB(l1,-1=A;BbA), § Remaining digits A are either -1 if there's only one digit left or all the digits but the head otherwise
?rhL:I            § List of used integers is called L
(
   mH:Ar:[L]c:1&  § If H is already in L, call the predicate with input [A,H,L]
   ;              § Else
   :[H]:\"~w \"w, § Print H followed by a space
   L:[H]c:_:Ar:1& § Call the predicate with input [A,[],M] where M is L with H appended to it
)

Fatalize

Posted 2015-09-15T12:29:31.317

Reputation: 32 976

CJam, 26 bytes

LLq{+_a2$&{a+L}|}/:X+X!S**

Test it here.

Explanation

L        e# Push an empty array to keep track if the previous segments.
L        e# Push an empty array to build the current segment.
q{       e# For each character in the input...
  +      e#   Add it to the current segment.
  _a2$&  e#   Duplicate and check if it's already in the segment list.
  {      e#   If not...
    a+L  e#     Add it to the list and push a new empty array for the next segment.
  }|
}/
:X+      e# Store the trailing segment in X and add it's *characters* to the list.
         e# For valid splittings, this trailing segment will be empty, so that the
         e# list remains unchanged.
X!       e# Push X again and take the logical NOT.
S*       e# Get that many spaces, i.e. 1 for valid segments and 0 otherwise.
*        e# Join the list with this string between elements.

Martin Ender

Posted 2015-09-15T12:29:31.317

Reputation: 184 808

JavaScript (ES6), 109

My output format is not exactly the same of the output samples in the questioin (there is a leading space). I don't see that as a flaw, as the output format is not specified (just The program should print the number after the number...)

Test running the snippet below in a EcmaScript 6 compliant browser. Developed with Firefox, tested and running on the latest Chrome.

/* Test: redirect console.log */ console.log=x=>O.innerHTML+=x+'\n';

F=s=>{for(z=s,b=l=o=' ';s[+l];)~o.search(b+(n=s.slice(0,++l)+b))||(s=s.slice(l),o+=n,l=0);console.log(s?z:o)}

/* Test cases */
test = [
  '2015',
,'10101010'
,'4815162342'
,'101010101010'
,'3455121372425'
,'123456789101112131415'
,'11312123133'
,'314159265358979323846264338327950288419716939937']

test.forEach(t=>{console.log('\n'+t);F(t)})

<pre id=O></pre>

edc65

Posted 2015-09-15T12:29:31.317

Reputation: 31 086

GNU sed, 83 77 73 71 bytes

(Score one extra because we require -r flag)

h
s/./&_/
:
/(\b[^ ]+).*\b\1_/{
s/_(.)/\1_/
t
g
}
s/_(.)/ \1_/
t
s/_//

The inner loop tests for a repeated sequence and appends characters as needed until a unique number appears after the separator _. The outer loop moves _ along.

Expanded, annotated version:

#!/bin/sed -rf

# Stash original in hold space
h

# Add separator
s/./&_/

:
# If current candidate is a duplicate, ...
/(\b[^ ]+).*\b\1_/{
#  ...then attempt to lengthen it ...
s/_(.)/\1_/
# ... and repeat if we succeeded, ...
t
# ... otherwise, restore original string
g
}
# Insert a space, and move our separator along
s/_(.)/ \1_/
t

# Remove the separator if we still have it
s/_//

Toby Speight

Posted 2015-09-15T12:29:31.317

Reputation: 5 058

You could combine both the t's into one. – User112638726 – 2015-09-16T14:51:19.523

Also /((\b[^ ]+).*\b\2)_/{ can be rewritten as /(\b[^ ]+).*\b\1_/{, no reason for 2 capture groups. – User112638726 – 2015-09-16T15:31:33.123

No problem :), you need to change the reference to \1 though! – User112638726 – 2015-09-16T15:51:37.513

Ruby, 57 + 1 = 58 bytes

s=''
l={}
$_.chars{|c|l[s<<c]||=s=''}
l[s]||$_=l.keys*' '

Uses the command line flag -p (or pl if your input has a trailing newline). Exploits several traits of Ruby Hash dictionaries: you can safely mutate the string you used to define a key without it changing the key (which doesn't work for other mutable types), .keys returns the keys in the order they were inserted, and the []||= operator provides a terse way of branching on whether a given key is already there.

histocrat

Posted 2015-09-15T12:29:31.317

Reputation: 20 600

Haskell, 105 bytes

f does it.

e""s""=unwords s
e t s""=concat s++t
e t s(c:r)|t&c`elem`s=e(t&c)s r|0<1=e""(s&(t&c))r
a&b=a++[b]
f=e""[]

Leif Willerts

Posted 2015-09-15T12:29:31.317

Reputation: 1 060

PHP - 148 bytes

Cool challenge, lots of fun!

$x=fgets(STDIN);$w=array();$k='';$l='';for($i=0;$i<strlen($x);$i++){$k.=$x[$i];if(!isset($w[$k])){$l.=$k.' ';$w[$k]=1;$k='';}}echo strlen($k)?$x:$l;

user15259

Posted 2015-09-15T12:29:31.317

Reputation:

I haven't seen that number before!

Answers

Pyth, 22 bytes

Retina, 68 61 bytes

Explanation

Pyth, 24 23 bytes

Python 3, 92 bytes

Python 3, 100 99 bytes

Brachylog, 91 bytes

Explanation

CJam, 26 bytes

Explanation

JavaScript (ES6), 109

GNU sed, 83 77 73 71 bytes

Ruby, 57 + 1 = 58 bytes

Haskell, 105 bytes

PHP - 148 bytes