Enumerate binary trees

20

Binary trees

A binary tree is a tree with nodes of three types:

  • terminal nodes, which have no children
  • unary nodes, which have one child each
  • binary nodes, which have two children each

We can represent them with the following grammar, given in BNF (Backus–Naur form):

<e> ::= 
      <terminal>   
    | <unary>
    | <binary>

<terminal> ::= 
    "0"

<unary> ::= 
    "(1" <e> ")"

<binary> ::= 
    "(2" <e> " " <e> ")"

In this grammar the nodes are given in preorder and each node is represented by a digit which is the number of children it has.

Motzkin numbers

Motzkin numbers (OEIS) (Wikipedia) have many interpretations, but one interpretation is that the nth Motzkin number is the number of distinct binary trees with n nodes. A table of Motzkin numbers starts

N          Motzkin number M(N)
1          1
2          1
3          2 
4          4 
5          9 
6         21 
7         51 
8        127 
    ...

e.g. M(5) is 9, and the nine distinct binary trees with 5 nodes are

1      (1 (1 (1 (1 0))))  
2      (1 (1 (2 0 0)))  
3      (1 (2 0 (1 0)))  
4      (1 (2 (1 0) 0))  
5      (2 0 (1 (1 0)))  
6      (2 0 (2 0 0))  
7      (2 (1 0) (1 0))  
8      (2 (1 (1 0)) 0)  
9      (2 (2 0 0) 0)  

Task

Take a single positive integer n as input and output all of the distinct binary trees with n nodes.

Examples for n from 1 to 5 with parenthesis included for readability

0

(1 0)

(1 (1 0))
(2 0 0)

(1 (1 (1 0)))
(1 (2 0 0))
(2 0 (1 0))
(2 (1 0) 0)

(1 (1 (1 (1 0))))
(1 (1 (2 0 0)))
(1 (2 0 (1 0)))
(1 (2 (1 0) 0))
(2 0 (1 (1 0)))
(2 0 (2 0 0))
(2 (1 0) (1 0))
(2 (1 (1 0)) 0)
(2 (2 0 0) 0)

Input

The input will be one positive integer.

Output

The output should be an intelligible representation of the distinct binary trees with that many nodes. It is not compulsory to use the exact string given by the BNF grammar above: it is sufficient that the syntax used give an unambiguous representation of the trees. E.g. you could use [] instead of (), an extra level of brackets [[]] instead of [], outer parenthesis are present or missing, extra commas or no commas, extra spaces, parenthesis or no parenthesis, etc.

All of these are equivalent:

(1 (2 (1 0) 0))  
[1 [2 [1 0] 0]]  
1 2 1 0 0  
12100  
(1 [2 (1 0) 0])  
.:.--  
*%*55  
(- (+ (- 1) 1))
-+-11

Also a variation purposed by @xnor in a comment. Since there is a way to translate this to a format that can be understood it is acceptable.

[[[]][]]  is (2 (1 0) 0)

To make this easier to understand convert some of the [] to () like so

[([])()]

Now if you start with

[]

then insert a binary which needs two expressions you get

 [()()] which is 2

and then for the first () insert a unary which needs one expression you get

 [([])()] which is 21

but since [] or () with no inner bracketing can represent 0 which needs no more expressions you can interpret it as

 2100

Note that answers should work theoretically with infinite memory, but will obviously run out of memory for an implementation-dependent finite input.

Variations of output

BNF             xnor       Christian   Ben
b(t, b(t, t))   [{}{{}{}}] (0(00))     (1, -1, 1, -1)                         
b(t, u(u(t)))   [{}{(())}] (0((0)))    (1, -1, 0, 0)           
b(u(t), u(t))   [{()}{()}] ((0)(0))    (1, 0, -1, 0)                     
b(b(t, t), t)   [{{}{}}{}] ((00)0)     (1, 1, -1, -1)              
b(u(u(t)), t)   [{(())}{}] (((0))0)    (1, 0, 0, -1)                          
u(b(t, u(t)))   [({}{()})] ((0(0)))    (0, 1, -1, 0)                          
u(b(u(t), t))   [({()}{})] (((0)0))    (0, 1, 0, -1)                        
u(u(b(t, t)))   [(({}{}))] (((00)))    (0, 0, 1, -1)                          
u(u(u(u(t))))   [(((())))] ((((0))))   (0, 0, 0, 0)  

A possible place to check for duplicate trees

One place to check for a duplicate is with M(5).
This one tree was generated twice for M(5) from M(4) trees

(2 (1 0) (1 0))  

the first by adding a unary branch to

(2 (1 0) 0)

and second by adding a unary branch to

(2 0 (1 0))

Understanding BNF

BNF is composed of simple rules:

<symbol> ::= expression

where on the left is a symbol name surrounded by <>.
On the right is the expression for constructing the symbol. Some rules use other rules in the construction, e.g.

<e> ::= <terminal>

e can be a terminal

and some rules have characters that are used in constructing the symbol, e.g.

<terminal> ::= "0"

terminal is just the character zero.

Some rules have multiple ways of constructing them, e.g.

<e> ::= 
      <terminal>   
    | <unary>
    | <binary>

An e can be a <terminal> or a <unary> or a <binary>.

And some rules are a sequence of parts, e.g.

<unary> ::= "(1" <e> ")"

A unary is the characters (1 followed by what can be constructed for e followed by ).

You always start with the starting rule, which for this <e>.

Some simple examples:

The simplest sequence is just 0. So we start with the starting rule <e> and see that there are three choices:

  <terminal>   
| <unary>
| <binary>

so take the first one <terminal>. Now a terminal has no choices and is 0. So replace <terminal> with 0 in the <e> rule and you are done.

Then next one is (1 0). Start with <e> and use rule <unary> which has

"(1" <e> ")"

Now this needs an <e> so we go back to <e> and make a choice of one of the three, this time choosing, <terminal> which gives 0. Replacing 0 into (1 <e> ) gives (1 0), and this is replaced into <unary> so <e> is (1 0).

Guy Coder

Posted 2017-03-14T14:17:43.807

Reputation: 321

So, a binary tree? "a binary tree is a tree data structure in which each node has at most two children" – fəˈnɛtɪk – 2017-03-14T14:19:18.140

3Your description is that of a binary tree. Binary trees do not need to have 2 children. It just means they have at most 2 children. I guess unary-binary is just a more specific term which doesn't really mean anything different. – fəˈnɛtɪk – 2017-03-14T14:22:37.080

Consider clarifying what "BNF" is for those of us who are not computer scientists – Luis Mendo – 2017-03-14T14:46:14.260

1@GuyCoder My point is, if someone sees "BNF" and doesn't know what that means they may be put off and stop reading. Perhaps using the name instead of the acronym and adding a link to the Wikipedia would suffice – Luis Mendo – 2017-03-14T15:35:08.940

I think that since these are binary trees, they should be called binary trees. No sense creating a new term for something that already has a proper name. – mbomb007 – 2017-03-14T16:28:36.853

4@mbomb007 Name changed. I should get a peer-pressure award for that. :) – Guy Coder – 2017-03-14T16:43:47.817

Just make sure that if your friends all jump off a binary bridge that you don't. – mbomb007 – 2017-03-14T16:50:35.967

Not sure about the "this is just a binary tree" talk. In my opinion, if a node in a usual binary tree has only one descendant, you could still differentiate between it being the left or right one. – Christian Sievers – 2017-03-14T17:26:39.870

Answers

12

Haskell, 68 bytes

t 0=[""]
t 1=["0"]
t n=['(':x++y++")"|k<-[1..n-1],x<-t k,y<-t$n-k-1]

Terminal nodes are represented by 0, unary and binary nodes by (e) resp. (ee), so the two three-node trees are given as (00) and ((0)).

Examples:

*Main> t 5
["(0(00))","(0((0)))","((0)(0))","((00)0)","(((0))0)","((0(0)))","(((0)0))","(((00)))","((((0))))"]
*Main> length $ t 8
127
*Main> length $ t 15
113634 

Christian Sievers

Posted 2017-03-14T14:17:43.807

Reputation: 6 366

5

CJam (37 bytes)

0aa{_2m*2\f+1Y$f+++:e__&}qi:A*{,A=},p

Online demo. Note that this is not very efficient, and you probably don't want to try calculating input 5 online.

Dissection to follow.

Peter Taylor

Posted 2017-03-14T14:17:43.807

Reputation: 41 901

5

Pyth (24 21 19 bytes)

This is based on my Python 3 solution.

f!|sTf<sY0._T^}1_1t

It's my first time using Pyth so this is likely still golfable.

Example, output when input is 4:

[[1, 0, -1], [1, -1, 0], [0, 1, -1], [0, 0, 0]]

1 represents a binary node, 0 represents a unary node, and -1 represents a terminal node. There is an implied terminal node at the end of every tree.

Explanation:

f!|sTf<sY0._T^}1_1t
f                    filter
             ^    t  length n-1 lists of elements
              }1_1   from [1, 0, -1]
 !|                  for when both
   sT                sum of list is 0, and
     f    ._T        for each prefix of list,
      <sY0           sum of prefix is non-negative.

Ben Frankel

Posted 2017-03-14T14:17:43.807

Reputation: 301

Of interest: Pyth source code

– Guy Coder – 2017-03-15T10:20:53.557

4

brainfuck, 107 bytes

,>++>-[-[<-[<-[>>[[>+<-]<]>+>[[<+>>>>>+<<<<-]>]>>++>,++++>]>[<+>-[+>>]>[<->[.<<<
<<]+[->+]+>>>]]]]<[[,<]<]<]

Formatted:

,>++>-
[
  -
  [
    <-
    [
      <-
      [
        >>
        [[>+<-]<]
        >+>
        [[<+> >>>>+<<<<-]>]
        >>++>,++++>
      ]
      >
      [
        <+>-
        [
          +>>
        ]
        >
        [
          <->[.<<<<<]
          +[->+]
          +>>>
        ]
      ]
    ]
  ]
  <
  [
    [,<]
    <
  ]
  <
]

Try it online

Input is taken as a byte, and the tree 12100 is represented as \x01\x02\x03\x02: to convert back, translate tr/\x01\x02\x03/012/, reverse the string, and add a final 0. Trees are separated by \xfe. (The output can be made easier to read by e.g. changing the first - into -36 and the . into +47.-47, where -36 means a string of 36 - characters, etc.)

This approach makes use of the property (which Ben Frankel also used): considering the possible nodes as -1, 0, 1 and disregarding the final -1 node, a list represents a valid tree if and only if (1) all prefixes of the list have non-negative sum, and (2) the sum of the entire list equals 0. The first condition is maintained while generating intermediate nodes, so only the second condition needs to be checked at the end.

The tape is divided into 5-node cells,

i d x 0 0

where i is the index (descending from left to right), d is the partial sum, and x is the element.

Sketch of control flow:

take n and push initial node
while stack is non-empty:
    if rightmost node can be decremented:
        decrement rightmost node
        if there are less than n nodes:
            push new node
        else if valid tree:
            print
    else:
        backtrack (pop)

Note that sometimes a value is stored or initialized as one or two greater than the actual (conceptual) value and adjusted as needed.

Mitch Schwartz

Posted 2017-03-14T14:17:43.807

Reputation: 4 899

3

Python 3 (138 134 128 121 119 bytes)

from itertools import*
lambda n:[any(sum(t[:k])<0for k in range(n))|sum(t)or print(t)for t in product(*[[-1,0,1]]*~-n)]

Example output, for n=5:

(0, 0, 0, 0)
(0, 0, 1, -1)
(0, 1, -1, 0)
(0, 1, 0, -1)
(1, -1, 0, 0)
(1, -1, 1, -1)
(1, 0, -1, 0)
(1, 0, 0, -1)
(1, 1, -1, -1)

1 represents a binary node, 0 represents a unary node, and -1 represents a terminal node. There is an implied terminal node at the end of every tree.

The program starts taking too long at around n=17.

Ben Frankel

Posted 2017-03-14T14:17:43.807

Reputation: 301

3

JavaScript (Firefox 30-57), 79 bytes

f=(m,l=0)=>m?[for(n of[1,0,-1])if(l>n&l<=m+n)for(a of f(m-1,l-n))[...a,n]]:[[]]

Where -1 represents a terminal, 0 a unary node and 1 a binary node. Starts getting slow at m=14 on my PC. Recursively works back from the end of the tree.

  • The number of unaccounted-for nodes l is limited by the fact that there may only be 1 node left at the end.
  • The type of the next node n is limited by the need to have enough unaccounted-for nodes to be its children.

Neil

Posted 2017-03-14T14:17:43.807

Reputation: 95 035

2

Prolog, 149 144 138 137 131 107 bytes

e(L,L)-->[0].

e([_|A],L)--> 
    [1],
    e(A,L).

e([_,_|A],L)--> 
    [2],
    e(A,B), 
    e(B,L).

e(M,E):-                   
    length([_|L],M),        
    e(L,[],E,[]).           

?- e(5,S).
S = [1, 1, 1, 1, 0] ;
S = [1, 1, 2, 0, 0] ;
S = [1, 2, 0, 1, 0] ;
S = [1, 2, 1, 0, 0] ;
S = [2, 0, 1, 1, 0] ;
S = [2, 0, 2, 0, 0] ;
S = [2, 1, 0, 1, 0] ;
S = [2, 1, 1, 0, 0] ;
S = [2, 2, 0, 0, 0].

And to count the solutions

e_count(N,Count) :-
    length([_|Ls], N),
    findall(., phrase(e(Ls,[]),E), Sols),
    length(Sols, Count).

?- e_count(N,Count).
N = Count, Count = 1 ;
N = 2, Count = 1 ;
N = 3, Count = 2 ;
N = Count, Count = 4 ;
N = 5, Count = 9 ;
N = 6, Count = 21 ;
N = 7, Count = 51 ;
N = 8, Count = 127 ;
N = 9, Count = 323 ;
N = 10, Count = 835 ;
N = 11, Count = 2188 ;
N = 12, Count = 5798 ;
N = 13, Count = 15511 ;
N = 14, Count = 41835 ;
N = 15, Count = 113634 

Guy Coder

Posted 2017-03-14T14:17:43.807

Reputation: 321

1

Python, 71 bytes

f=lambda n:{(a+b,)for k in range(n)for a in f(k)for b in f(n+~k)}or[()]

This represents trees as nested tuples such as ((((),), ()),), which can be transformed into ((())()) by removing commas, spaces, and the outermost ().

An earlier 76-byte version:

f=lambda n:{'('+a+b+')'for k in range(n)for a in f(k)for b in f(n+~k)}or['']

Mitch Schwartz

Posted 2017-03-14T14:17:43.807

Reputation: 4 899

1

CJam, 38 bytes

Uses a different approach that Peter Taylor's CJam answer.

3rim*{:(1\+[{1$+}*])\:(_:z#|!},

Output will be something like 1110120020102100. Each tree is a group of n digits (where n is the input number).

The basic idea is that we generate each possible string of digits 0, 1, and 2, and then filter only the ones that are well-formed trees.

Esolanging Fruit

Posted 2017-03-14T14:17:43.807

Reputation: 13 542