Drop of Chaos (Constructing a Minimally Aperiodic Sequence)

9

The idea here is to produce an almost repeating pattern. That is, the sequence being constructed changes at the last moment to avoid a repetition of some subsequence. Subsequences of the type AA and ABA are to be avoided (where B is no longer than A).

Examples:

I'll go ahead and start by listing all of the small examples to make my description clearer. Let's start with 0.

Valid:   0

Invalid: 00    (AA pattern)
Valid:   01

Invalid: 010   (ABA pattern)
Invalid: 011   (AA pattern)
Valid:   012

Valid:   0120
Invalid: 0121  (ABA pattern)
Invalid: 0122  (AA pattern)

Invalid: 01200 (AA pattern)
Invalid: 01201 (ABA pattern; 01-2-01)
Invalid: 01202 (ABA pattern)
Valid:   01203

Now, I strongly believe that a 4 is never needed, though I do not have a proof, because I have easily found sequences of hundreds of characters long that use only 0123. (It's probably closely related to how only three characters are needed to have infinite strings that do not have any AA patterns. There's a Wikipedia page on this.)

Input/Output

Input is a single, positive, non-zero integer n. You may assume that n <= 1000.

Output is an n-character sequence with no subsequences that match either prohibited pattern (AA or ABA).

Sample inputs and outputs

>>> 1
0

>>> 2
01

>>> 3
012

>>> 4
0120

>>> 5
01203

>>> 50
01203102130123103201302103120132102301203102132012

Rules

  • Only the characters 0123 are allowed.
  • B is no longer than A. This is to avoid the situation where 012345 has to be followed by 6 because 0123451 has this: 1-2345-1. In other words, the sequence would be trivial and uninteresting.
  • n may be inputted through any method desired, except hard-coding.
  • Output may be either a list or a string, depending on which is easier.
  • No brute force; the run time should be on the order of minutes, at most an hour on a really slow machine, for n=1000. (This is intended to disqualify solutions that just loop through all n-length permutations of {0,1,2,3}, so that trick and similar tricks are disallowed.)
  • Standard loopholes are disallowed, as usual.
  • Scoring is in bytes. This is , so the shortest entry wins (probably - see bonus).
  • Bonus: pick the lowest allowed digit at each step. If 1 and 3 are possible choices for the next digit in the sequence, pick 1. Subtract 5 bytes from your score. However, take note of the note below.

Note!

Dead-ends are possible. Your program or function must avoid these. Here's an example:

Stump: 0120310213012310320130210312013210230120310213201230210312013023103201230213203102301203210231201302103123013203102130120321023013203123021032012310213012031023013203123021320123102130120
Stump: 0120310213012310320130210312013210230120310213201230210312013023103201230213203102301203210231201302103123013203102130120321023013203123021032012310213012031023013203123021320123102130123
Stump: 012031021301231032013021031201321023012031021320123021031201302310320123021320310230120321023120130210312301320310213012032102301320312302103201231021301203102301320312302132012310320
Stump: 012031021301231032013021031201321023012031021320123021031201302310320123021320310230120321023120130210312301320310213012032102301320312302103201231021301203102301320312302132012310321301203102130

Each of these sequences cannot be extended any further (without using a 4). But also note that there is a crucial difference between the first two and the second two. I'll replace the shared initial subsequence with an X to make this clearer.

Stump: X2130120
Stump: X2130123
Stump: X320
Stump: X321301203102130

The last two digits of X are 10, so the only possible choices for the next digit are 2 and 3. Choosing 2 leads to a situation where the sequence must terminate. The greedy algorithm will not work here. (Not without backtracking, anyway.)

El'endia Starman

Posted 2015-09-16T18:38:02.397

Reputation: 14 504

Can one use a brute force strategy of testing every possible string, even if it will not give an output in realistic time? Do you know there to be a solution for all n? If someone gives a heuristic semi-greedy algorithm, how will you check that it does not run into problems for a very large length? The general problem is an interesting one, and I haven't been able to find anything on pattern avoidance where we restrict the length of part of the pattern. If someone can produce a general recipe, I expect that to be the best approach. – xnor – 2015-09-16T20:03:51.580

I believe I disallowed brute force in the rules. I should probably highlight that. I do not have a proof that there exists a solution for all n, but given that the stumps my program finds tend to get longer by an average of 10 digits each time, I'm very sure that an infinite sequence exists. I'm not sure how a semi-greedy algorithm could be tested for arbitrarily large sequences. I could restrict the requirement to n=1000 and just not worry about higher n. – El'endia Starman – 2015-09-16T20:21:14.190

4I suppose AA is really type ABA where B is empty. This could perhaps help to streamline some solutions. – mathmandan – 2015-09-16T22:47:11.980

Answers

6

Retina, 86 bytes - 5 = 81

$
_
(r`^(?<-2>.)+_((.)+)\b$
$1!
\b$
0
3#
#
0#
1
1#
2
2#
3
)r`\1(?<-2>.)*((.)+)$
$0#
!
<empty>

Where <empty> represents an empty trailing line. You can run the above code from a single file with the -s flag.

The input should be given in unary, e.g. 111111. I haven't tested it for output on the order of thousands yet - two of the regexes might get a bit slow after a while - but it can easily handle a couple of hundred in a few seconds.

Explanation

This is a simple backtracking solution.

  1. Append a 0.
  2. While the current sequence is invalid, remove all trailing 3s and increment the last non-3.
  3. Repeat until we have a valid sequence of the desired length.

This backtracking is implemented by a loop of regex substitutions which aborts once the string remains unchanged through one iteration.

$
_

This appends a _ to the input, which is used to separate the unary input from the sequence we're building.

(r`^(?<-2>.)+_((.)+)\b$
$1!

This is the first substitution in the loop (indicated by the leading (). The regex matches if a) there is a word character (i.e. a digit) at the end the string (which means the string is valid - we'll see below that invalid sequences are marked with a trailing #) and b) there are at least as many characters in the sequence as in the input (this is checked using balancing groups). If that's the case, we remove the input and append a !. This ! serves to make all regexes in the loop fail, such that it terminates.

\b$
0

If there is a word character at the end (i.e. the sequence is valid, and the loop wasn't terminated by the previous step), append a 0.

3#
#

If (instead) the sequence had been marked invalid and ended in 3, we remove that 3 (but leave the sequence as invalid, because there's no possible continuation for the current prefix... so the next character needs to be backtracked as well).

0#
1
1#
2
2#
3

If the sequence is marked invalid and any digit other than 3 is at the end, we increment the digit and remove the marker.

)r`\1(?<-2>.)*((.)+)$
$0#

The last substitution in the loop (as indicated by the )). It checks whether the string ends in ABA (where B is not longer than A but potentially empty). The relative lengths of A and B are again checked using balancing groups, and the repetition of A is checked with a simple backreference.

If this regex matches, we mark the sequence invalid by appending #.

!
<empty>

Once the loop terminates, all we need to do is remove the ! and are then left with the desired output.

Martin Ender

Posted 2015-09-16T18:38:02.397

Reputation: 184 808

2

Python 2, 175 - 5 = 170 bytes

n=input();s='';u=j=-1
while n>len(s):
 while u>2:u=int(s[0]);s=s[1:]
 u+=1;t=`u`+s;m=c=0
 while t[c:]*0**m:c+=1;i=t[c:].find(t[:c]);m=j<i<=c
 if c>=len(t):s=t;u=j
print s[::j]

This is the greedy algorithm with backtracking. I wish it were shorter. I hope it's correct (see below).

It builds the string one digit at a time. Given a string of d digits that it has found already, it tries to append a 0 as the (d+1)st digit. If that doesn't work, then it tries a 1, then a 2, then a 3. If none of these works, then it goes back to the dth digit and increments it (if less than 3) or removes it (if equal to 3, in which case it increments the previous one, etc.).

The check for validity is the line with .find in it. In case anyone decides to read my code, I should say that this program is actually storing the string backwards, meaning that it's adding digits to the front. So the check involves looking for places where the first c digits appear again later on in the string (anyplace after the first c digits), and if there are any such places, whether the intervening length is at most c.

(Of course it reverses the string before printing.)

It could also easily be faster; I originally had it exiting various loops early for efficiency, but that cost precious bytes. It still does OK in the range of n=1000, though.

Anyway, the program does seem to exhibit a preference for the smaller digits, but it's not a very strong preference. For example, running it with n=2000 gave me a string with 523 zeros, 502 ones, 497 twos and 478 threes, ending in 30210312013021. So if anyone else is working on a greedy algorithm, maybe they can confirm this result. Or with n=1000 I got [263, 251, 248, 238] for the counts by digit.

Finally, I would mention that these counts are sort of suggestive of symmetry, almost (though not exactly) as if we had started with a uniform distribution and then converted some of the 3's to 0's and a few of the 2's to 1's. But obviously that could just be coincidence. I have no idea!

mathmandan

Posted 2015-09-16T18:38:02.397

Reputation: 943

1

Haskell, 115 (120 bytes − 5 bonus)

x?_|or[t x==t(drop i x)|i<-[1..length x],t<-[take$div(i+1)2]]=[]
x?0=[x]
x?n=(?(n-1)).(:x)=<<"0123"
f=reverse.head.([]?)

Run online at Ideone

Example run:

*Main> f 40
"0120310213012310320130210312013210230120"

Anders Kaseorg

Posted 2015-09-16T18:38:02.397

Reputation: 29 242