Enumerate valid Brainf**k programs

41

1

Golunar/Unary is a way to encode all valid Brainfuck programs, but it is not an enumeration, since most natural numbers do not correspond to a valid program.

For the purpose of this challenge, assume a doubly infinite tape and no comments, i.e., a Brainfuck program is valid if and only if it consists solely of the characters <>+-.,[] and all left and right brackets match.

For example, the empty program, ,[+][-]., [>+<[--].] and +[+[+][+[+]+]+]+. are valid Brainfuck programs, while ][, and a[] are not.

Task

Write a program or function that accepts a valid Brainfuck program as input and returns a natural number (1, 2, 3, …), with the following constraints:

  • The generated output must be different for all valid Brainfuck programs.

  • For every natural number n, there must be a valid Brainfuck program that, when provided as input, generates the output n.

Additional rules

  • Given a Brainfuck program of 100 or less bytes, your program or function must finish within one minute.

    This means that you cannot iterate over all valid Brainfuck programs until you match the input.

  • Standard rules apply.

Dennis

Posted 2015-08-26T23:27:19.127

Reputation: 196 637

3I was thinking to just encode it as octal but the matching brackets are making this tricky. – DankMemes – 2015-08-27T00:42:01.737

Is the empty program a valid Brainfuck program? Must it be mapped to a natural integer as well? – orlp – 2015-08-27T11:16:52.170

9Why the close vote? This is a fascinating question and the problem happens to have the size and variety for a great golf. – trichoplax – 2015-08-27T11:17:22.410

1@orlp Yes, the empty program satisfies the above definition – Dennis – 2015-08-27T12:49:16.677

3Still waiting to see an answer written in Brainfuck... – Michael Hampton – 2015-08-28T01:41:27.013

Answers

16

Python 3, 443 158 155 154 134 131 128 124 117 116 115 bytes

c=d=C=D=0
for e in input():v='[<>,.-+]'.find(e);d=d*8+v;c+=c<0<6<v;c-=d>1>v;C,D=(c,C+1,d,D)[v>6::2]
print(-~D*8**C)

Several bytes thanks to Sp3000 and Mitch Schwartz :D

How this works:

This maps all valid BF programs into all possible, valid or invalid, BF programs, that don't start with a [, in a one-to-one ratio. After that, the new program is simply converted into octal.

Here is the mapping formula:

  1. Separate a BF program into 3 parts. The first part is the largest prefix consisting of only [ characters. The third part is the largest postfix consisting of only ] characters. The second part is the middle.
  2. Dispose of the first part. These can be recomputed later.
  3. Remove all ] brackets in the third part that match [ brackets in the second part. These can also be recomputed later.
  4. Concatenate the second and third parts together.

If you don't understand this explanation, you can find an extended explanation in chat starting here.

For reference, here are the first 20 programs:

1 : 
2 : <
3 : >
4 : ,
5 : .
6 : -
7 : +
8 : []
9 : <[]
10 : <<
11 : <>
12 : <,
13 : <.
14 : <-
15 : <+
16 : [<]
17 : >[]
18 : ><
19 : >>
20 : >,

Here are the first 1000 programs: http://pastebin.com/qykBWhmD
Here is the program I used to generate them: http://ideone.com/e8oTVl

Here is Hello, World!:

>>> ++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.
457711481836430915510337664562435564418569135809989841510260388418118348571803953323858180392373

TheNumberOne

Posted 2015-08-26T23:27:19.127

Reputation: 10 855

Minor quibble: You cannot map a program to 0. – Dennis – 2015-08-27T12:50:27.413

@Dennis Does an empty program count as a program though? – Beta Decay – 2015-08-27T13:03:39.023

@Dennis I'll fix that when I golf it. – TheNumberOne – 2015-08-27T13:25:37.393

Your reference output is wrong. When I run >. I get 20. When I run >, I get 19. – isaacg – 2015-08-27T14:55:09.507

3This is ingenious – proud haskeller – 2015-08-28T01:09:34.447

@isaacg Sorry, those reference outputs were handmade and untested. The new ones are computer generated. – TheNumberOne – 2015-08-28T02:19:17.933

13

Python 2, 157 bytes

def f(s,o=0,d=0,D={}):T=s,o,d;x=D[T]=D[T]if T in D else~o and 0**o+sum(f(s[1:],cmp(c,"[")%-3-~o,d or cmp(c,s[0]))for c in"+,-.<>[]")if s else~d<0==o;return+x

Still looks pretty golfable, but I'm posting this for now. It uses recursion with a bit of caching. Annoyingly, D.get doesn't short circuit for the caching, so I can't save 9 bytes that way...

The mapping prioritises length first, then lexicographical order over the ordering "][><.-,+" (see output examples below). The main idea is to compare prefixes.

The variable o keeps track of the number of [ brackets still open for the current prefix, while the variable d takes one of three values indicating:

  • d = 1: The current prefix is lexicographically earlier than s. Add all programs with this prefix and length <= s,
  • d = -1: The current prefix is lexicographically later than s. Add all programs with this prefix and length < s.
  • d = 0: The current prefix is a prefix of s, so we might change d to 1 or -1 later.

For example, if we have s = "[-]" and our current prefix is p = "+", since p is later than s lexicographically we know only to add the programs starting with p which are strictly shorter than s.

To give a more detailed example, suppose we have an input program s = "-[]". The first recursive expansion does this:

  (o == 0)               # Adds a program shorter than s if it's valid
                         # For the first expansion, this is 1 for the empty program
+ f(s[1:], o=-1, d=1)    # ']', o goes down by one due to closing bracket
+ f(s[1:], o=1, d=1)     # '[', o goes up by one due to opening bracket
+ f(s[1:], o=0, d=1)     # '>'
+ f(s[1:], o=0, d=1)     # '<'
+ f(s[1:], o=0, d=1)     # '.', d is set to 1 for this and the previous branches
                         # since they are lexicographically earlier than s's first char
+ f(s[1:], o=0, d=0)     # '-', d is still 0 since this is equal to s's first char
+ f(s[1:], o=0, d=-1)    # ',', d is set to -1 for this and the later branches
                         # since they are lexicographically later than s's first char
+ f(s[1:], o=0, d=-1)    # '+'

Note how we don't actually use the prefixes in the recursion - all we care about them is captured through the variables d, o and the shrinking input program s. You'll notice a lot of repetition above - this is where caching comes in, allowing us to process 100-char programs well within the time limit.

When s is empty, we look at (d>=0 and o==0), which decides whether to return 1 (count this program because it's lexicographically early/equal and the program is valid), or 0 (don't count this program).

Any situtation with o < 0 immediately returns 0, since any programs with this prefix have more ]s than [, and are thus invalid.


The first 20 outputs are:

 1
> 2
< 3
. 4
- 5
, 6
+ 7
[] 8
>> 9
>< 10
>. 11
>- 12
>, 13
>+ 14
<> 15
<< 16
<. 17
<- 18
<, 19
<+ 20

Using the same Hello World example as @TheNumberOne's answer:

>>> f("++++++++[>++++[>++>+++>+++>+<<<<-]>+>+>->>+[<]<-]>>.>---.+++++++..+++.>>.<-.<.+++.------.--------.>>+.>++.")
3465145076881283052460228065290888888678172704871007535700516169748342312215139431629577335423L

Sp3000

Posted 2015-08-26T23:27:19.127

Reputation: 58 729

4

Python 2, 505 (not golfed)

I enjoyed developing this approach, but I may not bother golfing it because it isn't competitive compared with other approaches. I'm posting it for diversity's sake and possible aesthetic interest. It involves recursion and a bit of math.

F={0:1}

def f(n):
    if n not in F:
        F[n]=6*f(n-1) + sum(f(i)*f(n-2-i) for i in range(n-1))

    return F[n]

def h(x):
    if x=='': return 0

    if len(x)==1: return '+-<>,.'.find(x)

    if x[0]!='[':
        return h(x[0]) * f(len(x)-1) + h(x[1:])

    d=i=1
    while d:
        if x[i]==']': d-=1
        elif x[i]=='[': d+=1
        i+=1

    a=i-2
    b=len(x)-i

    return 6*f(a+b+1) + sum(f(i)*f(a+b-i) for i in range(a)) + h(x[1:i-1]) * f(b) + h(x[i:])

def g(x):
    return sum(f(i) for i in range(len(x))) + h(x) + 1

print g(raw_input())

The function f(n) counts the number of valid brainfuck programs of length n. h(x) maps programs of length n to [0..f(n)-1], and g(x) is the bijective ranking function in question.

The main idea is that a non-empty program can either start with [ or with one of the 6 non-[] characters. In the former case, we can iterate over the possible locations of the matching ] and recurse on the enclosed part and on the tail (where tail means the substring following the ]). In the latter case, we can recurse on the tail (where tail means drop the first character). This reasoning can be used both for counting and for computing rank.

Shorter programs will always have lower rank than longer programs, and the bracket pattern is a secondary determining factor. The non-[] characters are sorted according to "+-<>,." (which is arbitrary).

For example with n=4 we have these cases:

zxxx
[]xx
[x]x
[xx]

where z stands for non-[] character and x stands for any character, under the restriction that the ] has to match the initial [. Programs are ranked according to that order, and recursively on the x subsections, with the left section prioritised over the right section in the latter cases. The rank calculation is similar to mixed-radix numeral systems, and f is important for computing the current "radix".

Mitch Schwartz

Posted 2015-08-26T23:27:19.127

Reputation: 4 899

4

This answer is a formal proof for the answer by TheNumberOne, Enumerate valid Brainf**k programs, where it can be a bit hard to understand the fine points why the enumeration is correct. It's nontrivial to understand why there isn't some invalid program that maps to a number not covered by a valid program.

Throughout this answer capitals are used to denote programs, and lowercase variables are used for functions and integers. ~ is the concatenation operator.

Proposition 1:

Let the function f be the program described in that answer. Then for every program U there exists a valid program V such that f(U) = f(V)

Definition 1:

Let g(X) be the number of [ that appear in the program X, and let h(X) be the number of ] that appear.

Definition 2:

Define P(x) to be this function:

P(x) = "" (the empty program) when x <= 0
P(x) = "]" when x = 1
P(x) = "]]" when x = 2
etcetera

Definition 3:

Given a program X, denote X1 to be it's largest prefix of [ characters, X2 its center, and X3 its largest suffix of ] characters.

Proof of proposition 1:

If g(U) = h(U) then U is a valid program, and we can take V=U. (trivial case).

If g(U) < h(U) then we can create V by prepending n = h(U) - g(U) [ symbols. Obviously f(V) = f(U) as all [ symbols in the prefix are removed.

Now consider g(U) > h(U). Define T = U2 ~ U3. if g(T) <= h(T), then we can construct V by removing n = g(U) - h(U) [ symbols.

So we can assume that h(T) < g(T). Construct V = T ~ P(g(T) - h(T)).

We need three small facts to proceed:

Claim 1: g(U2) = g(T)

U3 does not contain any [ symbols by its definition. As T = U2 ~ U3, its [ symbols are all in the first part.

Claim 2: h(U3) < g(T)

This follows from noting that h(T) < g(T) and h(U3) < h(U3 ~ U2) = h(T).

Claim 3: h(V3) = g(U2) - h(U2)

h(V3) = h(U3) + g(T) - h(T)                           using the construction of V
h(V3) = h(U3) + g(U2) + g(U3) - h(U2) - h(U3)         apply the definition of T
h(V3) = g(U2) - h(U2) *one term cancels, g(U3)        is always zero, as U3 contains only `]` symbols*

Now we show that f(V) = f(U).

f(U) = U2 ~ P(h(U3) - g(U2)) = U2                     claim 2, definition of P

f(V) = U2 ~ P(h(V3) - g(V2))
     = U2 ~ P(h(V3) - g(U2))
     = U2 ~ P(g(U2) - h(U2) - g(U2))                  claim 3
     = U2 ~ P(-h(U2))
     = U2                                             definition P

This completes the proof. Q.E.D.

Let's do uniqueness as well.

Proposition 2:

Let U, V be two diffrent, valid programs. Then f(U) != f(V)

This is fairly straightforward compared to the previous proposition.

Let's assume that U2 = V2. But then the only way U and V can be diffrent is by adding or removing n [ and ] symbols to U1 and U3 respectively. Yet this changes the output of f, since f will count the number of unmatched ] symbols in the suffix.

Thus U2 != V2.

Obviously, this leads to a contradiction. As U2 and V2 are literally contained in the output of f(U) and f(V) respectively, they cannot differ, except at the 'edge', the place where U2 is concatenated with U3. But the first and last symbols of U2 and V2 cannot be [ or ] by definition, while those are the only symbols allowed in U1, U3, V1, V3 respectively and respectively again. Thus we get U2 = V2. Q.E.D.

aphid

Posted 2015-08-26T23:27:19.127

Reputation: 141