Create Balanced BST from Sorted List of Integers

15

10

Given a unique, sorted list of integers, create a balanced binary-search tree represented as an array without using recursion.

For example:

func( [1,2,3,5,8,13,21] ) => [5,2,13,1,3,8,21]

Before we start, a hint: we can simplify this problem a ton so that we don't actually have to think about the input integers (or any comparable object for that matter!).

If we know the input list is sorted already, it's contents are irrelevant. We can simply think about it in terms of indices into the original array.

An internal representation of the input array then becomes:

func( [0,1,2,3,4,5,6] ) => [3,1,5,0,2,4,6]

This means rather than writing something that has to deal with comparable objects, we really only need to write a function which maps from the range [0,n) to the resulting array. Once we have the new order, we can simply apply the mapping back to the values in the input to create the return array.

Valid solutions must:

  • Accept a zero-element array and return an empty array.
  • Accept an integer array of length n and return an integer array
    • Of length between n and the next highest power of 2 minus 1. (e.g., for input size 13 return anywhere between 13 and 15).
    • Array which represents a BST where the root node is at position 0 and the height is equal to log(n) where 0 represents a missing node (or a null-like value if your language allows). Empty nodes, if present, must only exist at the end of the tree (e.g., [2,1,0])

The input integer array has the following guarantees:

  • Values are 32-bit signed integers greater than zero.
  • Values are unique.
  • Values are in ascending order from position zero.
  • Values may be sparse (i.e., not adjacent to each other).

The most terse code by ascii character count wins, but I'm also interested in seeing elegant solutions for any particular language.

Test cases

The outputs for simple arrays containing 1 to n for various n. As described above, the trailing 0s are optional.

[]
[1]
[2,1,0]
[2,1,3]
[3,2,4,1,0,0,0]
[4,2,5,1,3,0,0]
[4,2,6,1,3,5,0]
[4,2,6,1,3,5,7]
[5,3,7,2,4,6,8,1,0,0,0,0,0,0,0]
[6,4,8,2,5,7,9,1,3,0,0,0,0,0,0]
[7,4,9,2,6,8,10,1,3,5,0,0,0,0,0]
[8,4,10,2,6,9,11,1,3,5,7,0,0,0,0]
[8,4,11,2,6,10,12,1,3,5,7,9,0,0,0]
[8,4,12,2,6,10,13,1,3,5,7,9,11,0,0]
[8,4,12,2,6,10,14,1,3,5,7,9,11,13,0]
[8,4,12,2,6,10,14,1,3,5,7,9,11,13,15]

Jake Wharton

Posted 2014-01-16T06:04:50.077

Reputation: 319

All questions on this site, whether a programming puzzle or a code golf, should have an objective primary winning criterion, so that it is possible to indisputably decide which entry should win. – Howard – 2014-01-16T06:12:21.407

@Howard Thanks. Updated with definitive criteria for winner. – Jake Wharton – 2014-01-16T06:16:00.687

1It would be very useful to have some test cases which cover the difficult cases, rather than (as at present) just the easiest one. – Peter Taylor – 2014-01-16T10:23:46.190

Is there some reason for ruling out recursion? Not that I'm looking at a recursive solution, but that seems both artificial and unnecessary. – dmckee --- ex-moderator kitten – 2014-01-16T16:37:44.847

@dmckee I had hoped it would steer people towards figuring out a function which maps indices from the input array to the output array directly. Maybe I should post my solution... – Jake Wharton – 2014-01-16T16:41:27.477

Funny. I've actually been seriously researching this problem to improve cache behavior of binary searches in a very large amount of data (how to map the index of an array so that a binary search "middle" ends up in the first element, then second or third, etc.). The idea was that since the first few compares would end up in the beginning of the arrays it would avoid at least a few cache misses (and TLB misses later). Reversing the order of the bits in the index actually kind of worked, but I dropped it after figuring out regularity of the data and implementing interpolation search instead. – Art – 2014-01-17T12:58:07.233

1Can someone explain how the list represents a BST? – justinpc – 2014-09-02T12:47:53.050

Answers

4

Ruby, 143

s=ARGV.size;r,q=[],[[0,s]];s.times{b,e=q.shift;k=Math::log2(e-b).to_i-1;m=(e-b+2)>(3<<k)?b+(2<<k)-1:e-(1<<k);r<<ARGV[m];q<<[b,m]<<[m+1,e]};p r

It is a (loosely) compressed version of the following code which basically does a BFS on the tree.

def l(n)
    k = Math::log2(n).to_i-1
    if n+2 > (3<<k) then
        (2<<k)-1
    else
        n-(1<<k) 
    end
end

def bfs(tab)
  result = []
  queue = [[0,tab.size]]
  until queue.empty? do
    b,e = queue.shift
    m = b+l(e-b)
    result << tab[m]
    queue << [b,m] if b < m
    queue << [m+1,e] if m+1 < e
  end
  result
end

p bfs(ARGV)

Besides, because it's BFS, not DFS, a requirement of non-recursive solution is not significant, and it puts some languages at a disadvantage.

Edit: Fixed solution, thanks to @PeterTaylor for his comment!

dtldarek

Posted 2014-01-16T06:04:50.077

Reputation: 321

@PeterTaylor The intention was to put 3 on the left of 4, but there are no blanks, so it is wrong. Thanks for pointing that out! – dtldarek – 2014-01-16T11:18:44.477

@PeterTaylor Fixed over the lunch, it should work now. – dtldarek – 2014-01-16T13:04:36.537

4

Java, 252

Ok, here is my attempt. I've been playing around with bit operations and I came up with this direct way of calculating the index of an element in the BST from the index in the original array.

Compressed version

public int[]b(int[]a){int i,n=1,t;long x,I,s=a.length,p=s;int[]r=new int[(int)s];while((p>>=1)>0)n++;p=2*s-(1l<<n)+1;for(i=0;i<s;i++){x=(i<p)?(i+1):(p+2*(i-p)+1);t=1;while((x&1<<(t-1))==0)t++;I=(1<<(n-t));I|=((I-1)<<t&x)>>t;r[(int)I-1]=a[i];}return r;}

The long version follows below.

public static int[] makeBst(int[] array) {
  long size = array.length;
  int[] bst = new int[array.length];

  int nbits = 0;
  for (int i=0; i<32; i++) 
    if ((size & 1<<i)!=0) nbits=i+1;

  long padding = 2*size - (1l<<nbits) + 1;

  for (int i=0; i<size; i++) {
    long index2n = (i<padding)?(i+1):(padding + 2*(i-padding) + 1);

    int tail=1;
    while ((index2n & 1<<(tail-1))==0) tail++;
    long bstIndex = (1<<(nbits-tail));
    bstIndex = bstIndex | ((bstIndex-1)<<tail & index2n)>>tail;

    bst[(int)(bstIndex-1)] = array[i];
  }
 return bst;
}

mikail sheikh

Posted 2014-01-16T06:04:50.077

Reputation: 71

You need a character count, and this is not currently golfed. – dmckee --- ex-moderator kitten – 2014-01-16T16:36:12.150

@dmckee I've edited the post to include a compressed version and a character count – mikail sheikh – 2014-01-16T17:32:43.843

Good show. I'll bet that some of those spaces are unnecessary. In c, int[] b(int[] a) is just as well expressed as int[]b(int[]a). – dmckee --- ex-moderator kitten – 2014-01-16T17:42:08.427

You have a.length in the array allocation. Change it to s. Get rid of space between for ( multiple times. Each for loop creates a int i=0 and also int t=0. Create with n (int n=0,i,t;) and then just i=0 in the loops and t=1 inside. Declare the inner long x and long I with s and just initialize in the loop (long s=a.length,I,x; and x=../I=..). You shouldn't need spaces around the binary AND &. – Jake Wharton – 2014-01-17T05:47:07.550

Also, I=I|.. can be written I|=.. – Jake Wharton – 2014-01-17T05:48:28.650

I've shortened my code a bit further, thanks to all the suggestions. Using s in the array allocation needs a cast to int, so it doesn't save that much. But the rest of the optimizations were worth about 15 characters.

I also found a neat optimization for the log2 calculation here

– mikail sheikh – 2014-01-17T10:47:02.583

I only looked at your answer after I posted mine. It seems we took more or less the same approach. – Wouter Coekaerts – 2014-01-19T08:46:15.713

3

def fn(input):
    import math
    n = len(input)
    if n == 0:
        return []
    h = int(math.floor(math.log(n, 2)))
    out = []
    last = (2**h) - 2**(h+1) + n

    def num_children(level, sibling, lr):
        if level == 0:
            return 0
        half = 2**(level-1)
        ll_base = sibling * 2**level + lr * (half)
        ll_children = max(0, min(last, ll_base + half - 1) - ll_base + 1)
        return 2**(level-1) - 1 + ll_children

    for level in range(h, -1, -1):
        for sibling in range(0, 2**(h-level)):
            if level == 0 and sibling > last:
                break
            if sibling == 0:
                last_sibling_val = num_children(level, sibling, 0)
            else:
                last_sibling_val += 2 + num_children(level, sibling - 1, 1) \
                    + num_children(level, sibling, 0)
            out.append(input[last_sibling_val])
    return out

aarkay

Posted 2014-01-16T06:04:50.077

Reputation: 31

2

Wolfram Mathematica 11, 175 Bytes

g[l_]:=(x[a_]:=Floor@Min[i-#/2,#]&@(i=Length[a]+1;2^Ceiling@Log2[i]/2);Join@@Table[Cases[l//.{{}->{},b__List:>(n[Take[b,#-1],b[[#]],Drop[b,#]]&@x[b])},_Integer,{m}],{m,x[l]}])

The function g[l] takes as an input a List (e.g l={1,2,3,4,...}) and returns a List of the desired form. It works as follows:

  • x[a_]:=Floor@Min[i-#/2,#]&@(i=Length[a]+1;2^Ceiling@Log2[i]/2) takes a list and finds the root of the associated BST.
    • i=Length[a]+1 shortcut for the length of the list
    • 2^Ceiling@Log2[i]/2 upper bound on the value of the root
    • Min[i-#/2,#]&@(...) Minimum of the two argument where # stands for what is inside the (...)
  • l//.{...} Apply repeatedly the replacement rules that follow to l
  • {}->{} Nothing to do (this is the edge case to avoid an infinite loop)
  • b__List:>(n[Take[b,#-1],b[[#]],Drop[b,#]]&@x[b]) Split a List into {{lesser}, root, {greater}}
  • Cases[...,_Integer,{m}] Take all integers at level (depth) m
  • Table[...,{m,1,x[l]}] For all m up to x[l] (which is more than necessary actually).

It can be tested by running

Table[g[Range[a]], {a, 0, 15}]//MatrixForm

This implementation does not include trailing zeroes.

MannyC

Posted 2014-01-16T06:04:50.077

Reputation: 131

2

Not quite sure if this fits exactly your requirement of empty nodes being at the end of the tree and it certainly won't win any prizes for brevity, but I think it's correct and it has test cases :)

public class BstArray {
    public static final int[] EMPTY = new int[] { };
    public static final int[] L1 = new int[] { 1 };
    public static final int[] L2 = new int[] { 1, 2 };
    public static final int[] L3 = new int[] { 1, 2, 3 };
    public static final int[] L4 = new int[] { 1, 2, 3, 5 };
    public static final int[] L5 = new int[] { 1, 2, 3, 5, 8 };
    public static final int[] L6 = new int[] { 1, 2, 3, 5, 8, 13 };
    public static final int[] L7 = new int[] { 1, 2, 3, 5, 8, 13, 21 };
    public static final int[] L8 = new int[] { 1, 2, 3, 5, 8, 13, 21, 35 };
    public static final int[] L9 = new int[] { 1, 2, 3, 5, 8, 13, 21, 35, 56 };
    public static final int[] L10 = new int[] { 1, 2, 3, 5, 8, 13, 21, 35, 56, 91 };

    public static void main(String[] args) {
        for (int[] list : Arrays.asList(EMPTY, L1, L2, L3, L4, L5, L6, L7, L8, L9, L10)) {
            System.out.println(Arrays.toString(list) + " => " + Arrays.toString(bstListFromList(list)));
        }
    }

    private static int[] bstListFromList(int[] orig) {
        int[] bst = new int[nextHighestPowerOfTwo(orig.length + 1) - 1];

        if (orig.length == 0) {
            return bst;
        }

        LinkedList<int[]> queue = new LinkedList<int[]>();
        queue.push(orig);

        int counter = 0;
        while (!queue.isEmpty()) {
            int[] list = queue.pop();
            int len = list.length;

            if (len == 1) {
                bst[counter] = list[0];
            } else if (len == 2) {
                bst[counter] = list[1];
                queue.add(getSubArray(list, 0, 1));
                queue.add(new int[] { 0 });
            } else if (len == 3) {
                bst[counter] = list[1];
                queue.add(getSubArray(list, 0, 1));
                queue.add(getSubArray(list, 2, 1));
            } else {
                int divide = len / 2;
                bst[counter] = list[divide];
                queue.add(getSubArray(list, 0, divide));
                queue.add(getSubArray(list, divide + 1, len - (divide + 1)));
            }
            counter++;
        }

        return bst;
    }

    private static int nextHighestPowerOfTwo(int n) {
        n--;
        n |= n >> 1;
        n |= n >> 2;
        n |= n >> 4;
        n |= n >> 8;
        n |= n >> 16;
        n++;

        return n;
    }

    private static int[] getSubArray(int[] orig, int origStart, int length) {
        int[] list = new int[length];
        System.arraycopy(orig, origStart, list, 0, length);
        return list;
    }
}

Andrew Flynn

Posted 2014-01-16T06:04:50.077

Reputation: 121

2

Golfscript (99 89)

~]:b[]:^;{b}{{:|.,.2base,(2\?:&[-)&2/]{}$0=&(2/+:o[=]^\+:^;|o<.!{;}*|o)>.!{;}*}%:b}while^p

Basically a straight port of my Python solution, works in pretty much the same way.

Can probably be improved quite a bit with more "golfisms", already improved by 10 characters with @petertaylor's input :)

Joachim Isaksson

Posted 2014-01-16T06:04:50.077

Reputation: 1 161

I think it should be possible in no more than 70, although I still haven't quite finished my GolfScript answer. There are some easy improvements to yours, though. !{;}{}if can just be !{;}* because ! guarantees to return 0 or 1. You can use non-alphabetic tokens for variables, so if you use ^ instead of r, | instead of x, & instead of y you can eliminate all that whitespace. – Peter Taylor – 2014-01-16T23:16:21.313

@PeterTaylor Thanks, didn't know about non alphanumeric variables, still very new to golfscript :) – Joachim Isaksson – 2014-01-17T05:34:49.603

2

Java 192

Maps index in input to index in output

int[]b(int[]o){int s=o.length,p=0,u=s,i=0,y,r[]=new int[s],c[]=new int[s];while((u>>=1)>0)p++;for(int x:o){y=p;u=i;while(u%2>0){y--;u/=2;}r[(1<<y)-1+c[y]++]=x;i+=i>2*s-(1<<p+1)?2:1;}return r;}

Long version:

static int[] bfs(int[] o) {
  int rowCount = 32 - Integer.numberOfLeadingZeros(o.length); // log2
  int slotCount = (1<<rowCount) - 1; // pow(2,rowCount) - 1

  // number of empty slots at the end
  int emptySlots = slotCount - o.length;
  // where we start to be affected by these empty slots
  int startSkippingAbove = slotCount - 2 * emptySlots; // = 2 * o.length - slotCount

  int[] result = new int[o.length];
  int[] rowCounters = new int[rowCount]; // for each row, how many slots in that row are taken
  int i = 0; // index of where we would be if this was a complete tree (no trailing empty slots)
  for (int x : o) {
    // the row (depth) a slot is in is determined by the number of trailing 1s
    int rowIndex = rowCount - Integer.numberOfTrailingZeros(~i) - 1;
    int colIndex = rowCounters[rowIndex]++; // count where we are
    int rowStartIndex = (1 << rowIndex) - 1; // where this row starts in the result array

    result[rowStartIndex + colIndex] = x;

    i++;
    // next one has to jump into a slot that came available by not having slotCount values
    if (i > startSkippingAbove) i++;
  }

  return result;
}

Wouter Coekaerts

Posted 2014-01-16T06:04:50.077

Reputation: 211

1

J, 52 bytes

t=:/:(#/:@{.(+:,>:@+:@i.@>:@#)^:(<.@(2&^.)@>:@#`1:))

function takes a sorted list and returns in binary tree order

notice that trees have identical shape but bottom level is shortened

  • `1: start with 1
  • <.@(2&^.)@>:@# iterate by floor of log2 ( length + 1 )
  • +: , >:@+:@i.@>:@# loop : appends double of last vector with odd numbers 1,3 .. 2*length+1
  • # /:@{. take only the required number of items and get their sort indices
  • /: apply those sort indices to the given input

TIO

jayprich

Posted 2014-01-16T06:04:50.077

Reputation: 391

1

Python (175 171)

Fairly condensed, still fairly readable;

def f(a):
 b=[a]
 while b:
  c,t=((s,2**(len(bin(len(s)))-3))for s in b if s),[]
  for x,y in c:
   o=min(len(x)-y+1,y/2)+(y-1)/2
   yield x[o]
   t+=[x[:o],x[o+1:]]
  b=t

It yields the result back, so you can either loop over it or (for display purposes) print it as a list;

>>> for i in range(1,17): print i-1,list(f(range(1,i)))
 0 []
 1 [1]
 2 [2, 1]
 3 [2, 1, 3]
 4 [3, 2, 4, 1]
 5 [4, 2, 5, 1, 3]
 6 [4, 2, 6, 1, 3, 5]
 7 [4, 2, 6, 1, 3, 5, 7]
 8 [5, 3, 7, 2, 4, 6, 8, 1]
 9 [6, 4, 8, 2, 5, 7, 9, 1, 3]
10 [7, 4, 9, 2, 6, 8, 10, 1, 3, 5]
11 [8, 4, 10, 2, 6, 9, 11, 1, 3, 5, 7]
12 [8, 4, 11, 2, 6, 10, 12, 1, 3, 5, 7, 9]
13 [8, 4, 12, 2, 6, 10, 13, 1, 3, 5, 7, 9, 11]
14 [8, 4, 12, 2, 6, 10, 14, 1, 3, 5, 7, 9, 11, 13]
15 [8, 4, 12, 2, 6, 10, 14, 1, 3, 5, 7, 9, 11, 13, 15]

Joachim Isaksson

Posted 2014-01-16T06:04:50.077

Reputation: 1 161

@dtldarek His comment seems to be removed, but this seems to pass the test cases now. – Joachim Isaksson – 2014-01-16T21:23:17.693

I deleted my comment lest people refrain from upvoting @dtldarek's answer because of a comment saying it was buggy. – Peter Taylor – 2014-01-16T23:17:22.317

@PeterTaylor Well, thank you for your consideration ;-) – dtldarek – 2014-01-16T23:19:05.777

1

Java

This is a direct calculation solution. I think it works, but it has one pragmatically innocuous side effect. The array it produces may be corrupt but not in any way that would affect searches. Instead of producing 0 (null) nodes, it will produce unreachable nodes, that is the nodes will already have been found earlier in the tree during the search. It works by mapping the indices array of a regular power of 2 size binary search tree array onto an irregularly sized binary search tree array. At least, I think it works.

import java.util.Arrays;

public class SortedArrayToBalanceBinarySearchTreeArray
{
    public static void main(String... args)
    {
        System.out.println(Arrays.toString(binarySearchTree(19)));
    }

    public static int[] binarySearchTree(int m)
    {
        int n = powerOf2Ceiling(m + 1);
        int[] array = new int[n - 1];

        for (int k = 1, index = 1; k < n; k *= 2)
        {
            for (int i = 0; i < k; ++i)
            {
                array[index - 1] = (int) (.5 + ((float) (m)) / (n - 1)
                        * (n / (2 * k) * (1 + 2 * index) - n));
                ++index;
            }
        }

        return array;
    }

    public static int powerOf2Ceiling(int n)
    {
        n--;
        n |= n >> 1;
        n |= n >> 2;
        n |= n >> 4;
        n |= n >> 8;
        n |= n >> 16;
        n++;

        return n;
    }

}

Here's a more condensed version (just the function and names paired down). It's still got the white space, but I'm not worried about winning. Also this version actually takes an array. The other one just took an int for highest index in the array.

public static int[] b(int m[])
{
    int n = m.length;
    n |= n >> 1;
    n |= n >> 2;
    n |= n >> 4;
    n |= n >> 8;
    n |= n >> 16;
    n++;

    int[] a = new int[n - 1];

    for (int k = 1, j = 1, i; k < n; k *= 2)
    {
        for (i = 0; i < k; ++i)
        {
            a[j - 1] = m[(int) (.5 + ((float) m.length) / (n - 1)
                    * (n / (2 * k) * (1 + 2 * j) - n)) - 1];
            ++j;
        }
    }

    return a;
}

metaphyze

Posted 2014-01-16T06:04:50.077

Reputation: 111

Since this is [tag:code-golf], shorten your methods/names/etc to as short as possible; remove all whitespace (and unnecessary methods / material), and insert the character count. Otherwise, your doing great. – Justin – 2014-01-17T06:26:23.610

@Jake Wharton. I'd really like to see your direct mapping solution. I'm not a 100% sure mine works for very large arrays because it relies on a continuous mathematical mapping whose values get rounded. It certainly seems to work, but I'm not sure how to prove it. – metaphyze – 2014-01-17T17:50:05.020

1

GolfScript (79 77 70 chars)

Since the example in the question uses a function, I've made this a function. Removing the {}:f; to leave an expression which takes input on the stack and leaves the BST on the stack would save 5 chars.

{[.;][{{.!!{[.,.)[1]*{(\(@++}@(*1=/()\@~]}*}%.{0=}%\{1>~}%.}do][]*}:f;

Online demo (note: the app might take a bit of warming up: it timed out twice for me before running in 3 secs).

With whitespace to show the structure:

{
    # Input is an array: wrap it in an array for the working set
    [.;]
    [{
        # Stack: emitted-values working-set
        # where the working-set is essentially an array of subtrees
        # For each subtree in working-set...
        {
            # ...if it's not the empty array...
            .!!{
                # ...gather into an array...
                [
                    # Get the size of the subtree
                    .,
                    # OEIS A006165, offset by 1
                    .)[1]*{(\(@++}@(*1=
                    # Split into [left-subtree-plus-root right-subtree]
                    /
                    # Rearrange to root left-subtree right-subtree
                    # where left-subtree might be [] and right-subtree might not exist at all
                    ()\@~
                ]
            }*
        }%
        # Extract the leading element of each processed subtree: these will join the emitted-values
        .{0=}%
        # Create a new working-set of the 1, or 2 subtrees of each processed subtree
        \{1>~}%
        # Loop while the working-set is non-empty
        .
    }do]
    # Stack: [[emitted values at level 0][emitted values at level 1]...]
    # Flatten by joining with the empty array
    []*
}:f;

Peter Taylor

Posted 2014-01-16T06:04:50.077

Reputation: 41 901

0

Python 2, 107 bytes

Golf of Joachim Isaksson's answer, Input is a singleton containing the list.

def f(a):
 for x in a:
	if x:w=len(x);y=1<<len(bin(w))-3;o=min(w-y+1,y/2)+~-y/2;yield x[o];a+=x[:o],x[o+1:]

Try it online!

ovs

Posted 2014-01-16T06:04:50.077

Reputation: 21 408