Know a sequence by its subsequences

18

1

Introduction

Suppose you and your friend are playing a game. Your friend thinks of some particular sequence of n bits, and your task is to deduce the sequence by asking them questions. However, the only type of question you're allowed to ask is "How long is the longest common subsequence of your sequence and S", where S is any sequence of bits. The fewer questions you need, the better.

The task

Your task is to write a program or function that takes as input a positive integer n, and a binary sequence R of length n. The sequence may be an array of integers, a string, or some other reasonable type of your choice. Your program shall output the sequence R.

Your program is not allowed to access the sequence R directly. The only thing it's allowed to do to R is to give it as input to the function len_lcs along with another binary sequence S. The function len_lcs(R, S) returns the length of the longest common subsequence of R and S. This means the longest sequence of bits which occurs as a (not necessarily contiguous) subsequence in both R and S. The inputs of len_lcs which may be of different lengths. The program should call this function on R and other sequences some number of times, and then reconstruct the sequence R based on that information.

Example

Consider the inputs n = 4 and R = "1010". First, we might evaluate len_lcs(R, "110"), which gives 3, since "110" is the longest common subsequence of "1010" and "110". Then we know that R is obtained from "110" by inserting one bit at some position. Next, we might try len_lcs(R, "0110"), which returns 3 since the longest common subsequences are "110" and "010", so "0110" is not correct. Then we try len_lcs(R, "1010"), which returns 4. Now we know that R == "1010", so we can return that sequence as the correct output. This required 3 calls to len_lcs.

Rules and scoring

In this repository, you'll find a file called subsequence_data.txt containing 100 random binary sequences of lengths between 75 and 124. They were generated by taking three random floats between 0 and 1, taking their average as a, and then flipping an a-biased coin n times. You score is the average number of calls to len_lcs on these sequences, lower score being better. Your submission should record the number of calls. There are no time limits, except that you should run your program on the file before submitting it.

Your submission shall be deterministic. PRNGs are permitted, but they must use today's date, 200116 (or closest equivalent), as the random seed. You are not allowed to optimize your submission against these particular test cases. If I suspect this is happening, I will generate a new batch.

This is not code golf, so you're encouraged to write readable code. Rosetta Code has a page on the longest common subsequence; you may use that to implement len_lcs in your language of choice.

Zgarb

Posted 2016-01-20T23:37:53.650

Reputation: 39 083

Cool idea! Does this have any applications? – flawr – 2016-01-21T09:12:45.730

@flawr I don't know of any direct applications. The idea came from query complexity theory, a subfield of computer science, and that has loads of applications. – Zgarb – 2016-01-21T16:13:32.357

I think it would be great to have the same challenge again but where you can access lcs instead of len_lcs. – flawr – 2016-01-21T19:28:14.227

@flawr That wouldn't be very interesting, since lcs(R, "01"*2*n) returns R. ;) But that could work if calling lcs(R, S) would increase the score by len(S) instead of 1, or something like that... – Zgarb – 2016-01-21T19:32:33.637

Oh right, you'd really need another restriction=) – flawr – 2016-01-21T20:02:07.753

1I'd love to see other answers =S – flawr – 2016-01-24T00:38:04.247

Answers

10

Java, 99.04 98.46 97.66 lcs() calls

How it works

Exaple: Our line that is to be reconstructed is 00101. First we find out how many zeros there are, by comparing (here comparing = computing lcs with the original string) by a zeros only string 00000. Then we go through each position, flip the 0 to a 1 and check if we now have a longer common substring. If yes, accept and go to next position, if no, flip the current 1 back to a 0 and go to next position:

For our example of "00101" we get following steps:
input  lcs  prev.'best'
00000  3    0           //number of zeros
̲10000  3    3           //reject
0̲1000  3    3           //reject
00̲100  4    3           //accept
001̲10  4    4           //reject
0010̲1  5    4           //accept

Optimizations

This is just a "naive" implementation, perhaps it would be possible to find a more sophisticated alogrithm checking multiple positions at once. But I am not sure if there really is a better one (e.g. based on the calculation of parity bits similar to the Hamming code), as you can always just evaluate the length of the common substring.

For one given line of digits, this algorithm needs exactly #ofDigitsUntilTheLastOccurenceOf1 + 1 checks. (Substract one if the last digits is an 1.)

EDIT: One small optimization: If we just checked the 2nd last digit and we still need to insert a 1, we know for sure that it must be in the very last position, and can omit the corresponding check.

EDIT2: I just noticed you can apply above idea to the last k ones.

It might of course be possible to achieve a slightly lower score with the this optimization, by reordering the all lines first, because it could be, that you get more lines with ones at the very end but that would obviously be and optimization for the current test cases which is no longer funny.

Runtime

The upper limit is O(#NumberOfBits).

Full code

Here the full code:

package jcodegolf;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;

// http://codegolf.stackexchange.com/questions/69799/know-a-sequence-by-its-subsequences

public class SequenceReconstructor { 
    public static int counter = 0;
    public static int lcs(String a, String b) { //stolen from http://rosettacode.org/wiki/Longest_common_subsequence#Java
        int[][] lengths = new int[a.length()+1][b.length()+1];

        // row 0 and column 0 are initialized to 0 already

        for (int i = 0; i < a.length(); i++)
            for (int j = 0; j < b.length(); j++)
                if (a.charAt(i) == b.charAt(j))
                    lengths[i+1][j+1] = lengths[i][j] + 1;
                else
                    lengths[i+1][j+1] =
                        Math.max(lengths[i+1][j], lengths[i][j+1]);

        // read the substring out from the matrix
        StringBuffer sb = new StringBuffer();
        for (int x = a.length(), y = b.length();
             x != 0 && y != 0; ) {
            if (lengths[x][y] == lengths[x-1][y])
                x--;
            else if (lengths[x][y] == lengths[x][y-1])
                y--;
            else {
                assert a.charAt(x-1) == b.charAt(y-1);
                sb.append(a.charAt(x-1));
                x--;
                y--;
            }
        }

        counter ++;
        return sb.reverse().toString().length();
    }


    public static String reconstruct(String secretLine, int lineLength){
        int current_lcs = 0; 
        int previous_lcs = 0;
        char [] myGuess = new char[lineLength];
        for (int k=0; k<lineLength; k++){
            myGuess[k] = '0';
        }

        //find the number of zeros:
        int numberOfZeros = lcs(secretLine, String.valueOf(myGuess));
        current_lcs = numberOfZeros;
        previous_lcs = numberOfZeros;

        if(current_lcs == lineLength){ //were done
            return String.valueOf(myGuess);
        }


        int numberOfOnes = lineLength - numberOfZeros;
        //try to greedily insert ones at the positions where they maximize the common substring length
        int onesCounter = 0;
        for(int n=0; n < lineLength && onesCounter < numberOfOnes; n++){

            myGuess[n] = '1';
            current_lcs = lcs(secretLine, String.valueOf(myGuess));

            if(current_lcs > previous_lcs){ //accept

                previous_lcs = current_lcs;
                onesCounter ++;

            } else { // do not accept
                myGuess[n]='0';     
            }

            if(n == lineLength-(numberOfOnes-onesCounter)-1 && onesCounter < numberOfOnes){ //lets test if we have as many locations left as we have ones to insert
                                                                // then we know that the rest are ones
                for(int k=n+1;k<lineLength;k++){
                    myGuess[k] = '1';
                }
                break;
            }

        }

        return String.valueOf(myGuess);
    }

    public static void main(String[] args) {
        try {

            //read the file
            BufferedReader br;

            br = new BufferedReader(new FileReader("PATH/TO/YOUR/FILE/LOCATION/subsequence_data.txt"));

            String line;

            //iterate over each line
            while ( (line = br.readLine()) != null){

                String r = reconstruct(line, line.length());
                System.out.println(line);     //print original line
                System.out.println(r);        //print current line
                System.out.println(counter/100.0);  //print current number of calls
                if (! line.equals(r)){
                    System.out.println("SOMETHING WENT HORRIBLY WRONG!!!");
                    System.exit(1);
                }

            }


        } catch(Exception e){
            e.printStackTrace();;
        }

    }

}

flawr

Posted 2016-01-20T23:37:53.650

Reputation: 40 560

1Since you get fewer calls when there are trailing 1s, it seems like you could do better on average if, after the first guess tells you there are more 0s than 1s you switch to hunting for 0 positions rather than 1 positions. You could even do that multiple times. – histocrat – 2016-01-21T15:08:38.100

1@histocrat I think he's already stopping once he uses up the last 1, which is equivalent to there being only zeroes left. – Martin Ender – 2016-01-21T15:15:42.973