Heaviest increasing subsequence

9

1

A subsequence is a sequence that can be derived from another sequence by deleting some elements without changing the order of the remaining elements. A strictly increasing subsequence is a subsequence in which every element is bigger than the preceding one.

The heaviest increasing subsequence of a sequence is the strictly increasing subsequence that has the biggest element sum.

Implement a program or function in your language of choice that finds the element sum of the heaviest increasing subsequence of a given list of non-negative integers.

Examples:

                    [] ->  0 ([])
                   [3] ->  3 ([3])
             [3, 2, 1] ->  3 ([3])
          [3, 2, 5, 6] -> 14 ([3, 5, 6])
       [9, 3, 2, 1, 4] ->  9 ([9])
       [3, 4, 1, 4, 1] ->  7 ([3, 4])
       [9, 1, 2, 3, 4] -> 10 ([1, 2, 3, 4])
       [1, 2, 4, 3, 4] -> 10 ([1, 2, 3, 4])
[9, 1, 2, 3, 4, 5, 10] -> 25 ([1, 2, 3, 4, 5, 10])
       [3, 2, 1, 2, 3] ->  6 ([1, 2, 3])

Note that you only have to give the element sum of the heaviest increasing subsequence, not the subsequence itself.


The asymptotically fastest code wins, with smaller code size in bytes as a tiebreaker.

orlp

Posted 2015-08-01T10:33:45.243

Reputation: 37 067

How do you plan to deal with incomparable asymptotics? There are potentially two important variables: the length of the sequence, and the size of the largest element in the sequence. – Peter Taylor – 2015-08-01T10:41:07.460

@PeterTaylor I choose length of the sequence as the asymptotic. Your solution must not assume any bound on the integers, and in particular not loop or allocate memory based on the size of the numbers involved. You are forgiven if your language choice has bounded integers, but you must not make use of this fact in your solution. Does that satisfy your concerns? – orlp – 2015-08-01T10:46:38.990

Partially. It's still theoretically possible (although probably unlikely) that the fact that comparison of two unbounded integers takes size proportional to their log could be relevant. You might want to allow basic operations (addition, comparison, maybe multiplication) on the integers to be assumed to be O(1) time. – Peter Taylor – 2015-08-01T11:41:59.577

@PeterTaylor Is the transdichotomous model of computation specific enough?

– orlp – 2015-08-01T11:50:20.630

Seems reasonable. – Peter Taylor – 2015-08-01T12:36:42.727

How do we measure asymptotic fastness? Do you mean big O? – Luis Mendo – 2015-08-01T14:57:21.903

@LuisMendo Yes, worst case performance. – orlp – 2015-08-01T15:13:30.770

What should [1, 2, 4, 3, 4] return? 7 as in [1, 2, 4] or [3, 4], or 10 as in [1, 2, 3, 4]? – Alexander Vogt – 2015-08-01T17:34:54.713

@AlexanderVogt 10. – orlp – 2015-08-02T00:28:03.820

@PeterTaylor I'm not certain why you removed the code-golf tag, it's the secondary scoring requirement (and once O(n log n) answers start streaming in, the only one that matters). – orlp – 2015-08-02T00:30:34.480

Because it's only really appropriate where it's the primary scoring criterion. I did also consider removing both [tag:code-golf] and [tag:fastest-algorithm] and replacing them with [tag:code-challenge] since it's a bit of a hybrid. – Peter Taylor – 2015-08-02T05:48:52.837

Answers

3

javascript (ES6) O(n log n) 253 characters

function f(l){l=l.map((x,i)=>[x,i+1]).sort((a,b)=>a[0]-b[0]||1)
a=[0]
m=(x,y)=>x>a[y]?x:a[y]
for(t in l)a.push(0)
t|=0
for(j in l){for(i=(r=l[j])[1],x=0;i;i&=i-1)x=m(x,i)
x+=r[0]
for(i=r[1];i<t+2;i+=i&-i)a[i]=m(x,i)}for(i=t+1;i;i&=i-1)x=m(x,i)
return x}

this uses fenwick trees (a maximum fenwick tree) to find maxima of certain subsequences.

basically, in the underlying array of the datatype, each place is matched with an element from the input list, in the same order. the fenwick tree is initialized with 0 everywhere.

from the smallest to the biggest, we take an element from the input list, and look for the maximum of the elements to the left. they are the elements that may be before this one in the subsequence, because they are to the left in the input sequence, and are smaller, because they entered the tree earlier.

so the maximum we found is the heaviest sequence that can get to this element, and so we add to this the weight of this element, and set it in the tree.

then, we simply return the maximum of the whole tree is the result.

tested on firefox

proud haskeller

Posted 2015-08-01T10:33:45.243

Reputation: 5 866

4

Python, O(n log n)

I didn't golf this, because I'm competing primarily on the fastest code side of things. My solution is the heaviest_subseq function, and a test harness is also included at the bottom.

import bisect
import blist

def heaviest_subseq(in_list):
    best_subseq = blist.blist([(0, 0)])
    for new_elem in in_list:

        insert_loc = bisect.bisect_left(best_subseq, (new_elem, 0))

        best_pred_subseq_val = best_subseq[insert_loc - 1][1]

        new_subseq_val = new_elem + best_pred_subseq_val

        list_len = len(best_subseq)
        num_deleted = 0

        while (num_deleted + insert_loc < list_len
               and best_subseq[insert_loc][1] <= new_subseq_val):
            del best_subseq[insert_loc]
            num_deleted += 1

        best_subseq.insert(insert_loc, (new_elem, new_subseq_val))

    return max(val for key, val in best_subseq)

tests = [eval(line) for line in """[]
[3]
[3, 2, 1]
[3, 2, 5, 6]
[9, 3, 2, 1, 4]
[3, 4, 1, 4, 1]
[9, 1, 2, 3, 4]
[1, 2, 4, 3, 4]
[9, 1, 2, 3, 4, 5, 10]
[3, 2, 1, 2, 3]""".split('\n')]

for test in tests:
    print(test, heaviest_subseq(test))

Runtime analysis:

Each element has its insertion position looked up once, is inserted once, and is possibly deleted once, in addition to a constant number of value lookups per loop. Since I am using the built-in bisect package and the blist package, each of those operations are O(log n). Thus, the overall runtime is O(n log n).

The program works by maintaining a sorted list of best possible increasing subsequences, represented as a tuple of ending value and sequence sum. An increasing subsequence is in that list if there are no other subsequences found so far whose ending value is smaller and sum is at least as large. These are maintained in increasing order of ending value, and necessarily also in increasing order of sum. This property is maintained by checking the successor of each newly found subsequence, and deleting it if its sum is not large enough, and repeating until a subsequence with a larger sum is reached, or the end of the list is reached.

isaacg

Posted 2015-08-01T10:33:45.243

Reputation: 39 268

Interesting, a very different solution from mine.

– orlp – 2015-08-02T15:57:55.603

2

Python, O(n log n)

I used an index transform and a nifty data structure (binary indexed tree) to trivialize the problem.

def setmax(a, i, v):
    while i < len(a):
        a[i] = max(a[i], v)
        i |= i + 1

def getmax(a, i):
    r = 0
    while i > 0:
        r = max(r, a[i-1])
        i &= i - 1
    return r

def his(l):
    maxbit = [0] * len(l)
    rank = [0] * len(l)
    for i, j in enumerate(sorted(range(len(l)), key=lambda i: l[i])):
        rank[j] = i

    for i, x in enumerate(l):
        r = rank[i]
        s = getmax(maxbit, r)
        setmax(maxbit, r, x + s)

    return getmax(maxbit, len(l))

The binary indexed tree can do two operations in log(n): increase a value at index i and get the maximum value in [0, i). We initialize every value in the tree to 0. We index the tree using the rank of elements, not their index. This means that if we index the tree at index i, all elements [0, i) are the elements smaller than the one with rank i. This means that we get the maximum from [0, i), add the current value to it, and update it at i. The only issue is that this will include values which are less than the current value, but come later in the sequence. But since we move through the sequence from left-to-right and we initialized all values in the tree to 0, those will have a value of 0 and thus not affect the maximum.

orlp

Posted 2015-08-01T10:33:45.243

Reputation: 37 067

1

Python 2 - O(n^2) - 114 bytes

def h(l):
 w=0;e=[]
 for i in l:
    s=0
    for j,b in e:
     if i>j:s=max(s,b)
    e.append((i,s+i));w=max(w,s+i)
 return w

Tyilo

Posted 2015-08-01T10:33:45.243

Reputation: 1 372

1

C++ - O(n log n) - 261 bytes

Should be fixed now:

#include <set>
#include <vector>
int h(std::vector<int>l){int W=0,y;std::set<std::pair<int,int>>S{{-1,0}};for(w:l){auto a=S.lower_bound({w,-1}),b=a;y=prev(a)->second+w;for(;b!=S.end()&&b->second<=y;b++){}a!=b?S.erase(a,b):a;W=y>W?y:W;S.insert({w,y});}return W;}

Tyilo

Posted 2015-08-01T10:33:45.243

Reputation: 1 372

auto S=set<pair<I,I>>(); is longer than simply set<pair<I,I>> S;. #define I int is longer than using I=int;. There's no need to assign n to anything, you can replace auto n=*prev(S.lower_bound({w,-1}));I y=n.second with I y=prev(S.lower_bound({w,-1}))->second+w;. – orlp – 2015-08-02T10:22:10.887

Oh, and the initialization of S is very convoluted, you can just forego the insert and use std::set<std::pair<int,int>>S{{-1,0}};. – orlp – 2015-08-02T10:25:54.343

@orlp thanks! It shows that I don't use c++ ;) – Tyilo – 2015-08-02T10:34:48.100

Here's a much shorter version (still needs the set and vector include): using namespace std;using I=int;I h(vector<I>l){I W=0;set<pair<I,I>>S{{-1,0}};for(I w:l){I y=prev(S.lower_bound({w,-1}))->second+w;W=max(W,y);S.insert({w,y});}return W;} – orlp – 2015-08-02T10:35:39.393

Oh and dump the std::max, use W=y>W?y:W;. – orlp – 2015-08-02T10:40:05.690

Then you can also dump the using namespace std and just use std:: a couple of times, if I'm counting correctly. – orlp – 2015-08-02T10:43:16.740

@orlp Thanks. Just using int instead of I was also better now. – Tyilo – 2015-08-02T10:49:49.113

Actually, this solution is incorrect. It gives the wrong result (19) for [9, 1, 2, 3, 4, 5, 10] , the correct result is 25. – orlp – 2015-08-02T12:49:48.217

@orlp fixed it for that example, but I'm still not sure that it works for all valid inputs – Tyilo – 2015-08-02T13:27:17.617

I'm fairly certain it doesn't - try implementing a simpler O(n^2) solution and generating random sequences, comparing between the two. – orlp – 2015-08-02T13:33:50.213

@orlp Yeah it doesn't work for 3, 2, 1, 2, 3. – Tyilo – 2015-08-02T14:46:05.030

A n log n algorithm exists (I have an implementation), but it's surprisingly complex. – orlp – 2015-08-02T14:47:29.603

Let us continue this discussion in chat.

– Tyilo – 2015-08-02T15:55:53.590

could you explain what algorithm you used? – proud haskeller – 2015-08-04T19:11:08.433

@proudhaskeller It's the same algorithm as isaacg's answer. – Tyilo – 2015-08-04T21:09:41.027

0

Python, O(2n), 91 bytes

This is more for fun than to be competitive. An arcane recursive solution:

h=lambda l,m=0:l and(h(l[1:],m)if l[0]<=m else max(h(l[1:],m),l[0]+h(l[1:],l[0])))or 0

orlp

Posted 2015-08-01T10:33:45.243

Reputation: 37 067

1max(m,l[0]) given that not(l[0]<m) is just l[0], surely? – Peter Taylor – 2015-08-01T18:56:59.677

@PeterTaylor Derp. – orlp – 2015-08-02T00:29:48.777

This answer does not appear to be a serious contender. – pppery – 2020-01-07T00:23:29.010

0

Matlab, O(n 2n), 90 bytes

function m=f(x)
m=0;for k=dec2bin(1:2^numel(x)-1)'==49
m=max(m,all(diff(x(k))>0)*x*k);end

Examples:

>> f([])
ans =
     0
>> f([3])
ans =
     3
>> f([3, 2, 5, 6])
ans =
    14

Luis Mendo

Posted 2015-08-01T10:33:45.243

Reputation: 87 464