The Missing Number Revised

22

1

Background:

I originally posted this question last night, and received backlash on its vagueness. I have since consulted many personnel concerning not only the wording of the problem, but also its complexity (which is not O(1)). This programming problem is an evil spin on an Amazon interview question.

Question:

Given a String of randomly concatenated integers [0, 250), 0 to 250 exclusive, there is ONE number missing in the sequence. Your job is to write a program that will calculate this missing number. There are no other missing numbers in the sequence besides the one, and that is what makes this problem so difficult, and possibly computationally hard.

Doing this problem by hand on smaller Strings, such as examples 1 and 2 below are obviously very easy. Conversely, computing a missing number on incredibly large datasets involving three-digit or four-digit numbers would be incredibly difficult. The idea behind this problem is to construct a program that will do this process FOR you.

Important Information:

One thing that appeared as rather confusing when I posted this problem last night was: what exactly a missing number is defined as. A missing number is the number INSIDE of the range specified above; NOT necessarily the digit. In example 3, you will see that the missing number is 9, even though it appears in the sequence. There are 3 places the DIGIT 9 will appear in a series of [0, 30): “9”, “19”, and “29”. Your objective is to differentiate between these, and discover that 9 is the missing NUMBER (inside of example 3). In other words, the tricky part lies in finding out which sequences of digits are complete and which belong to other numbers.

Input:

The input is a String S, containing integers from 0 to 249 inclusive, or 0 to 250 exclusive (in other words, [0, 250)). These integers, as stated above, are scrambled up to create a random sequence. There are NO delimiters (“42, 31, 23, 44”), or padding 0’s (003076244029002); the problems are exactly as described in the examples. It is guaranteed that there is only 1 solution in the actual problems. Multiple solutions are not permitted for these.

Winning Criteria:

Whoever has the fastest, and lowest memory usage will be the winner. In the miraculous event that a time ties, lower memory will be used for the time breaker. Please list Big O if you can!

Examples:

Examples 1 and 2 have a range of [0, 10)

Examples 3 and 4 have a range of [0, 30)

(Examples 1-4 are just for demonstration. Your program needn't to handle them.)

Examples 5 has a range of [0, 250)

1. 420137659    
- Missing number => 8

2. 843216075    
- Missing number => 9  

3. 2112282526022911192312416102017731561427221884513 
- Missing number => 9

4. 229272120623131992528240518810426223161211471711
- Missing number => 15

5. 11395591741893085201244471432361149120556162127165124233106210135320813701207315110246262072142253419410247129611737243218190203156364518617019864222241772384813041175126193134141008211877147192451101968789181153241861671712710899168232150138131195104411520078178584419739178522066640145139388863199146248518022492149187962968112157173132551631441367921221229161208324623423922615218321511111211121975723721911614865611197515810239015418422813742128176166949324015823124214033541416719143625021276351260183210916421672722015510117218224913320919223553222021036912321791591225112512304920418584216981883128105227213107223142169741601798025
- Missing number => 71

Test Data: 

Problem 1: 6966410819610521530291368349682309217598570592011872022482018312220241246911298913317419721920718217313718080857232177134232481551020010112519172652031631113791105122116319458153244261582135510090235116139611641267691141679612215222660112127421321901862041827745106522437208362062271684640438174315738135641171699510421015199128239881442242382361212317163149232839233823418915447142162771412092492141987521710917122354156131466216515061812273140130240170972181176179166531781851152178225242192445147229991613515911122223419187862169312013124150672371432051192510724356172282471951381601241518410318414211212870941111833193145123245188102

Problem 2: 14883423514241100511108716621733193121019716422221117630156992324819917158961372915140456921857371883175910701891021877194529067191198226669314940125152431532281961078111412624224113912011621641182322612016512820395482371382385363922471472312072131791925510478122073722091352412491272395020016194195116236186596116374117841971602259812110612913254255615723013185162206245183244806417777130181492211412431591541398312414414582421741482461036761192272120204114346205712198918190242184229286518011471231585109384415021021415522313136146178233133168222201785172212108182276835832151134861116216716910511560240392170208215112173234136317520219

Problem 3: 1342319526198176611201701741948297621621214122224383105148103846820718319098731271611601912137231471099223812820157162671720663139410066179891663131117186249133125172622813593129302325881203242806043154161082051916986441859042111711241041590221248711516546521992257224020174102234138991752117924457143653945184113781031116471120421331506424717816813220023315511422019520918114070163152106248236222396919620277541101222101232171732231122301511263822375920856142187182152451585137352921848164219492411071228936130762461191564196185114910118922611881888513917712153146227193235347537229322521516718014542248813617191531972142714505519240144

Problem 4: 2492402092341949619347401841041875198202182031161577311941257285491521667219229672211881621592451432318618560812361201172382071222352271769922013259915817462189101108056130187233141312197127179205981692121101632221732337196969131822110021512524417548627103506114978204123128181211814236346515430399015513513311152157420112189119277138882021676618323919018013646200114160165350631262167910238144334214230146151171192261653158161213431911401452461159313720613195248191505228186244583455139542924222112226148941682087115610915344641782142472102436810828123731134321131241772242411722251997612923295223701069721187182171471055710784170217851

TheProgrammer

Posted 2017-12-06T02:52:28.073

Reputation: 395

1Clarification: I see you tagged [tag:fastest-algorithm], but it's a bit unclear in the description. is this challenge [tag:fastest-algorithm] (as in, lowest time complexity) or [tag:fastest-code] (as in, taking least amount of time on a particular machine)? – JungHwan Min – 2017-12-06T03:15:54.350

2Also, must the program support any values of N, not just 250? / What about the 232 issue? All possibilities or one any? I realize that you knew about that issue, but I find it unclear in the question. / If this is [tag:fastest-code] there must be a way to measure them. Of course running on a supercomputer is different from running on an old computer. / Because no one said that, -- Welcome to PPCG! – user202729 – 2017-12-06T03:22:19.703

Ah, I was wondering what had happened, welcome back! I think the case remains that some problems might have multiple solutions. – fede s. – 2017-12-06T03:23:04.863

Why are the examples have different ranges, but the question specify fixed range (250)? Which one do you want? – user202729 – 2017-12-06T03:28:43.820

250 is what I want; I was using the others to DEMONSTRATE the idea. Nevertheless, the problem has been solved in Java and C++. – TheProgrammer – 2017-12-06T03:31:10.953

And once again, there is only ONE number missing; it's impossible to have multiple solutions. – TheProgrammer – 2017-12-06T03:32:44.853

The program should be capable of solving N = 250. – TheProgrammer – 2017-12-06T03:34:40.637

@JoshuaCrotts Let's continue discussing in the chatroom.

– user202729 – 2017-12-06T03:39:11.587

1This is a fascinating problem, but (at least according to the answers thus far) is too trivial to get enough computational complexity to be able to meaningfully differentiate between the answers to determine a winner, which is a bummer. – AdmBorkBork – 2017-12-06T13:32:59.293

@AdmBorkBork, you are correct... I'm excited to see the solutions though! I do apologize for there not being enough clarity to determine a true WINNER; in all honesty, I just wanted to see if this problem was solvable. – TheProgrammer – 2017-12-06T17:04:50.107

1@JoshuaCrotts you could always raise N to, say, 1000 or 10000. – Οurous – 2017-12-06T18:23:09.010

@Ourous But wouldn't that invalidate some answers? Would I need to repost the question? – TheProgrammer – 2017-12-07T01:40:13.243

4Congrats on PPCG post #150,000 ;) – ETHproductions – 2017-12-07T02:21:32.737

@JoshuaCrotts as far as I can tell, all the current answers either take N as an argument or have it in a variable, which would require almost no change. None have methods dependent on a certain bound of N. – Οurous – 2017-12-07T06:24:41.233

@Ourous should I just repost the question? Or should I update it? If I want people to get a fair chance/actually SEE the post, would updating it bump it? – TheProgrammer – 2017-12-08T20:21:07.627

I think, I found an instance which is fairly regular, but seems to be difficult to solve for the programs I've tried (clingo + cpp). Maybe someone can confirm its validity and the results.

– politza – 2017-12-09T00:24:09.197

@politza My (fixed) clingo program solves it in ≈ 0.03 seconds, and clingo -n0 (compute all models) validates that 231 is the only solution. – Anders Kaseorg – 2017-12-09T21:02:59.130

@JoshuaCrotts Is there an unstated assumption that no number is repeated (so 23456789100 must parse as 2, …, 9, 10, 0 rather than 2, …, 9, 1, 0, 0)? – Anders Kaseorg – 2017-12-09T21:05:20.623

@Anders Kaseorg That is correct; none of the numbers are repeated. – TheProgrammer – 2017-12-11T01:32:15.683

@JoshuaCrotts Great (please edit in this clarification though).

– Anders Kaseorg – 2017-12-11T03:11:08.943

Answers

10

Clingo, ≈ 0.03 seconds

This is too fast to be accurately measured—you’ll need to allow larger input cases rather than artificially stopping at 250.

% cat(I) means digits I and I+1 are part of the same number.
{cat(I)} :- digit(I, D), digit(I+1, E).

% prefix(I, X) means some digits ending at I are part of the same
% number prefix X.
prefix(I, D) :- digit(I, D), not cat(I-1), D < n.
prefix(I, 10*X+D) :- prefix(I-1, X), digit(I, D), cat(I-1), X > 0, 10*X+D < n.

% Every digit is part of some prefix.
:- digit(I, D), {prefix(I, X)} = 0.

% If also not cat(I), then this counts as an appearance of the number
% X.
appears(I, X) :- prefix(I, X), not cat(I).

% No number appears more than once.
:- X=0..n-1, {appears(I, X)} > 1.

% missing(X) means X does not appear.
missing(X) :- X=0..n-1, {appears(I, X)} = 0.

% Exactly one number is missing.
:- {missing(X)} != 1.

#show missing/1.

Example input

Input is a list of (k, kth digit) pairs. Here is problem 1:

#const n = 250.
digit(0,6;1,9;2,6;3,6;4,4;5,1;6,0;7,8;8,1;9,9;10,6;11,1;12,0;13,5;14,2;15,1;16,5;17,3;18,0;19,2;20,9;21,1;22,3;23,6;24,8;25,3;26,4;27,9;28,6;29,8;30,2;31,3;32,0;33,9;34,2;35,1;36,7;37,5;38,9;39,8;40,5;41,7;42,0;43,5;44,9;45,2;46,0;47,1;48,1;49,8;50,7;51,2;52,0;53,2;54,2;55,4;56,8;57,2;58,0;59,1;60,8;61,3;62,1;63,2;64,2;65,2;66,0;67,2;68,4;69,1;70,2;71,4;72,6;73,9;74,1;75,1;76,2;77,9;78,8;79,9;80,1;81,3;82,3;83,1;84,7;85,4;86,1;87,9;88,7;89,2;90,1;91,9;92,2;93,0;94,7;95,1;96,8;97,2;98,1;99,7;100,3;101,1;102,3;103,7;104,1;105,8;106,0;107,8;108,0;109,8;110,5;111,7;112,2;113,3;114,2;115,1;116,7;117,7;118,1;119,3;120,4;121,2;122,3;123,2;124,4;125,8;126,1;127,5;128,5;129,1;130,0;131,2;132,0;133,0;134,1;135,0;136,1;137,1;138,2;139,5;140,1;141,9;142,1;143,7;144,2;145,6;146,5;147,2;148,0;149,3;150,1;151,6;152,3;153,1;154,1;155,1;156,3;157,7;158,9;159,1;160,1;161,0;162,5;163,1;164,2;165,2;166,1;167,1;168,6;169,3;170,1;171,9;172,4;173,5;174,8;175,1;176,5;177,3;178,2;179,4;180,4;181,2;182,6;183,1;184,5;185,8;186,2;187,1;188,3;189,5;190,5;191,1;192,0;193,0;194,9;195,0;196,2;197,3;198,5;199,1;200,1;201,6;202,1;203,3;204,9;205,6;206,1;207,1;208,6;209,4;210,1;211,2;212,6;213,7;214,6;215,9;216,1;217,1;218,4;219,1;220,6;221,7;222,9;223,6;224,1;225,2;226,2;227,1;228,5;229,2;230,2;231,2;232,6;233,6;234,0;235,1;236,1;237,2;238,1;239,2;240,7;241,4;242,2;243,1;244,3;245,2;246,1;247,9;248,0;249,1;250,8;251,6;252,2;253,0;254,4;255,1;256,8;257,2;258,7;259,7;260,4;261,5;262,1;263,0;264,6;265,5;266,2;267,2;268,4;269,3;270,7;271,2;272,0;273,8;274,3;275,6;276,2;277,0;278,6;279,2;280,2;281,7;282,1;283,6;284,8;285,4;286,6;287,4;288,0;289,4;290,3;291,8;292,1;293,7;294,4;295,3;296,1;297,5;298,7;299,3;300,8;301,1;302,3;303,5;304,6;305,4;306,1;307,1;308,7;309,1;310,6;311,9;312,9;313,5;314,1;315,0;316,4;317,2;318,1;319,0;320,1;321,5;322,1;323,9;324,9;325,1;326,2;327,8;328,2;329,3;330,9;331,8;332,8;333,1;334,4;335,4;336,2;337,2;338,4;339,2;340,3;341,8;342,2;343,3;344,6;345,1;346,2;347,1;348,2;349,3;350,1;351,7;352,1;353,6;354,3;355,1;356,4;357,9;358,2;359,3;360,2;361,8;362,3;363,9;364,2;365,3;366,3;367,8;368,2;369,3;370,4;371,1;372,8;373,9;374,1;375,5;376,4;377,4;378,7;379,1;380,4;381,2;382,1;383,6;384,2;385,7;386,7;387,1;388,4;389,1;390,2;391,0;392,9;393,2;394,4;395,9;396,2;397,1;398,4;399,1;400,9;401,8;402,7;403,5;404,2;405,1;406,7;407,1;408,0;409,9;410,1;411,7;412,1;413,2;414,2;415,3;416,5;417,4;418,1;419,5;420,6;421,1;422,3;423,1;424,4;425,6;426,6;427,2;428,1;429,6;430,5;431,1;432,5;433,0;434,6;435,1;436,8;437,1;438,2;439,2;440,7;441,3;442,1;443,4;444,0;445,1;446,3;447,0;448,2;449,4;450,0;451,1;452,7;453,0;454,9;455,7;456,2;457,1;458,8;459,1;460,1;461,7;462,6;463,1;464,7;465,9;466,1;467,6;468,6;469,5;470,3;471,1;472,7;473,8;474,1;475,8;476,5;477,1;478,1;479,5;480,2;481,1;482,7;483,8;484,2;485,2;486,5;487,2;488,4;489,2;490,1;491,9;492,2;493,4;494,4;495,5;496,1;497,4;498,7;499,2;500,2;501,9;502,9;503,9;504,1;505,6;506,1;507,3;508,5;509,1;510,5;511,9;512,1;513,1;514,1;515,2;516,2;517,2;518,2;519,3;520,4;521,1;522,9;523,1;524,8;525,7;526,8;527,6;528,2;529,1;530,6;531,9;532,3;533,1;534,2;535,0;536,1;537,3;538,1;539,2;540,4;541,1;542,5;543,0;544,6;545,7;546,2;547,3;548,7;549,1;550,4;551,3;552,2;553,0;554,5;555,1;556,1;557,9;558,2;559,5;560,1;561,0;562,7;563,2;564,4;565,3;566,5;567,6;568,1;569,7;570,2;571,2;572,8;573,2;574,4;575,7;576,1;577,9;578,5;579,1;580,3;581,8;582,1;583,6;584,0;585,1;586,2;587,4;588,1;589,5;590,1;591,8;592,4;593,1;594,0;595,3;596,1;597,8;598,4;599,1;600,4;601,2;602,1;603,1;604,2;605,1;606,2;607,8;608,7;609,0;610,9;611,4;612,1;613,1;614,1;615,1;616,8;617,3;618,3;619,1;620,9;621,3;622,1;623,4;624,5;625,1;626,2;627,3;628,2;629,4;630,5;631,1;632,8;633,8;634,1;635,0;636,2).

Example output

$ clingo missing.lp problem1.lp 
clingo version 5.2.2
Reading from missing.lp ...
Solving...
Answer: 1
missing(148)
SATISFIABLE

Models       : 1+
Calls        : 1
Time         : 0.032s (Solving: 0.00s 1st Model: 0.00s Unsat: 0.00s)
CPU Time     : 0.032s

Anders Kaseorg

Posted 2017-12-06T02:52:28.073

Reputation: 29 242

This solution seems to give a false answer in many cases, e.g. 45879362100 with n = 11 and 1 missing (answers missing(0)). – politza – 2017-12-09T19:34:15.967

@politza Fixed. Should I also add the unstated assumption that no number is repeated (else missing(10) is also valid)? – Anders Kaseorg – 2017-12-09T20:55:08.533

I'm still getting false results, e.g. on this instance.

– politza – 2017-12-09T21:50:44.923

Would you mind writing one or two sentences about the ideas behind your model ? – politza – 2017-12-09T22:01:04.393

@politza Right, it turns out the given tests do rely on the above unstated assumption, so I’ve added it. The revised program gives a unique result on your instance too. (But I would still appreciate if the assumption were stated explicitly in the question.) – Anders Kaseorg – 2017-12-10T03:52:35.477

I see now what you meant and would prefer that assumption. Thanks for the added comments. – politza – 2017-12-10T07:15:03.890

9

C++, 5000 random test cases in 6.1 seconds

This is practically fast, but there may exist some testcases that make it slow. Complexity unknown.

If there are multiple solutions, it will print them all. Example.

Explanation:

  1. Count the occurrences of digits.

  2. List all possible answers.

  3. Check if a candidate is a valid answer:

    3-1. Try to split the string(s) by numbers which only occur once and mark it as identified, except the candidate.
    For example, 2112282526022911192312416102017731561427221884513 has only one 14, so it can be split into 211228252602291119231241610201773156 and 27221884513.

    3-2. If any split string has length 1, mark it as identified.
    If any contradiction is made (identified more than once), the candidate is not valid.
    If we cannot find the candidate in the string, the candidate is valid.

    3-3. If any split is made, repeat step 3-1. Otherwise, do a brute force search to check if the candidate is valid.

#include <cmath>
#include <bitset>
#include <string>
#include <vector>
#include <cstring>
#include <iostream>
#include <algorithm>

const int VAL_MAX = 250;
const int LOG_MAX = log10(VAL_MAX - 1) + 1;
using bools = std::bitset<VAL_MAX>;

std::pair<size_t, size_t> count(const std::string& str, const std::string& target)
{
    size_t ans = 0, offset = 0, pos = std::string::npos;
    for (; (offset = str.find(target, offset)) != std::string::npos; ans++, pos = offset++);
    return std::make_pair(ans, pos);
}

bool dfs(size_t a, size_t b, const std::vector<std::string>& str, bools& cnt, int t)
{ // input: string id, string position, strings, identified, candidate
    if (b == str[a].size()) a++, b = 0;
    if (a == str.size()) return true;   // if no contradiction on all strings, the candidate is valid

    int p = 0;
    for (int i = 0; i < LOG_MAX; i++) { // assume str[a][b...b+i] is a number
        if (str[a].size() == b) break;
        p = p * 10 + (str[a][b++] ^ '0');
        if (p < VAL_MAX && !cnt[p] && p != t) { //if no contradiction
            cnt[p] = true;
            if (dfs(a, b, str, cnt, t)) return true; // recursively check
            cnt[p] = false;
        }
    }
    return false;
}

struct ocr {
    int l, r, G;
    bool operator<(const ocr& i) const { return l > i.l; }
};

int cal(std::vector<std::string> str, bools cnt, int t)
{ // input: a list of strings, whether a number have identified, candidate
  // try to find numbers that only occur once in those strings
    int N = str.size();
    std::vector<ocr> pos;

    for (int i = 0; i < VAL_MAX; i++) {
        if (cnt[i]) continue;             // try every number which haven't identified
        int flag = 0;
        std::string target = std::to_string(i);
        ocr now;
        for (int j = 0; j < N; j++) {     // count occurences
            auto c = count(str[j], target);
            if ((flag += c.first) > 1) break;
            if (c.first) now = {c.second, c.second + target.size(), j};
        }
        if (!flag && t == i) return true; // if cannot find the candidate, then it is valid
        if (i != t && flag == 1) pos.push_back(now), cnt[i] = true;
        // if only occur once, then its position is fixed, mark as identified
    }
    if (!pos.size()) { // if no number is identified, do a brute force search
        std::sort(str.begin(), str.end(), [](const std::string& a, const std::string& b){return a.size() < b.size();});
        return dfs(0, 0, str, cnt, t);
    }

    std::sort(pos.begin(), pos.end());
    std::vector<std::string> lst;
    for (auto& i : pos) {      // split strings by identified numbers
        if ((size_t)i.r > str[i.G].size()) return false;
        std::string tmp = str[i.G].substr(i.r);
        if (tmp.size() == 1) { // if split string has length 1, it is identified
            if (cnt[tmp[0] ^ '0']) return false; // contradiction if it is identified before
            cnt[tmp[0] ^ '0'] = true;
        }
        else if (tmp.size()) lst.push_back(std::move(tmp));
        str[i.G].resize(i.l);
    }
    for (auto& i : str) { // push the remaining strings; same as above
        if (i.size() == 1) {
            if (cnt[i[0] ^ '0']) return false;
            cnt[i[0] ^ '0'] = true;
        }
        else if (i.size()) lst.push_back(std::move(i));
    }
    return cal(lst, cnt, t); // continue the split step with new set of strings
}

int main()
{
    std::string str;
    std::vector<ocr> pos;
    std::vector<int> prob;
    std::cin >> str;

    int p[10] = {0, 0, 0, 0, 0, 0, 0, 0, 0, 0};
    for (int i = 0; i < VAL_MAX; i++)
        for (char j : std::to_string(i)) p[j ^ '0']++;
    for (char i : str) p[i ^ '0']--; // count digit occurrences
    {
        std::string tmp;
        for (int i = 0; i < 10; i++)
            while (p[i]--) tmp.push_back(i ^ '0');
        do {           // list all possible candidates (at most 4)
            int c = std::stoi(tmp);
            if (c < VAL_MAX && tmp[0] != '0') prob.push_back(c);
        } while (std::next_permutation(tmp.begin(), tmp.end()));
    }
    if (prob.size() == 1) { std::cout << prob[0] << '\n'; return 0; }
                       // if only one candidate, output it
    for (int i : prob) // ... or check if each candidate is valid
        if (cal({str}, bools(), i)) std::cout << i << '\n';
}

Try it online!

Colera Su

Posted 2017-12-06T02:52:28.073

Reputation: 2 291

6

Clean, ~0.3s

Fixed a huge bug in the algorithm, need to re-optimize it now.

module main
import StdEnv
import StdLib
import System.CommandLine

maxNum = 250
sample = "11395591741893085201244471432361149120556162127165124233106210135320813701207315110246262072142253419410247129611737243218190203156364518617019864222241772384813041175126193134141008211877147192451101968789181153241861671712710899168232150138131195104411520078178584419739178522066640145139388863199146248518022492149187962968112157173132551631441367921221229161208324623423922615218321511111211121975723721911614865611197515810239015418422813742128176166949324015823124214033541416719143625021276351260183210916421672722015510117218224913320919223553222021036912321791591225112512304920418584216981883128105227213107223142169741601798025"
case1 = "6966410819610521530291368349682309217598570592011872022482018312220241246911298913317419721920718217313718080857232177134232481551020010112519172652031631113791105122116319458153244261582135510090235116139611641267691141679612215222660112127421321901862041827745106522437208362062271684640438174315738135641171699510421015199128239881442242382361212317163149232839233823418915447142162771412092492141987521710917122354156131466216515061812273140130240170972181176179166531781851152178225242192445147229991613515911122223419187862169312013124150672371432051192510724356172282471951381601241518410318414211212870941111833193145123245188102"
case2 = "14883423514241100511108716621733193121019716422221117630156992324819917158961372915140456921857371883175910701891021877194529067191198226669314940125152431532281961078111412624224113912011621641182322612016512820395482371382385363922471472312072131791925510478122073722091352412491272395020016194195116236186596116374117841971602259812110612913254255615723013185162206245183244806417777130181492211412431591541398312414414582421741482461036761192272120204114346205712198918190242184229286518011471231585109384415021021415522313136146178233133168222201785172212108182276835832151134861116216716910511560240392170208215112173234136317520219"
case3 = "1342319526198176611201701741948297621621214122224383105148103846820718319098731271611601912137231471099223812820157162671720663139410066179891663131117186249133125172622813593129302325881203242806043154161082051916986441859042111711241041590221248711516546521992257224020174102234138991752117924457143653945184113781031116471120421331506424717816813220023315511422019520918114070163152106248236222396919620277541101222101232171732231122301511263822375920856142187182152451585137352921848164219492411071228936130762461191564196185114910118922611881888513917712153146227193235347537229322521516718014542248813617191531972142714505519240144"
case4 = "2492402092341949619347401841041875198202182031161577311941257285491521667219229672211881621592451432318618560812361201172382071222352271769922013259915817462189101108056130187233141312197127179205981692121101632221732337196969131822110021512524417548627103506114978204123128181211814236346515430399015513513311152157420112189119277138882021676618323919018013646200114160165350631262167910238144334214230146151171192261653158161213431911401452461159313720613195248191505228186244583455139542924222112226148941682087115610915344641782142472102436810828123731134321131241772242411722251997612923295223701069721187182171471055710784170217851"

failing = "0102030405060708090100101102103104105106107108109110120130140150160170180190200201202203204205206207208209210220230240249248247246245244243242241239238237236235234233232229228227226225224223222221219218217216215214213212211199198197196195194193192191189188187186185184183182181179178177176175174173172171169168167166165164163162161159158157156155154153152151149148147146145144143142141139138137136135134133132131129128127126125124123122121119118117116115114113112111999897969594939291898887868584838281797877767574737271696867666564636261595857565554535251494847464544434241393837363534333231292827262524232221191817161514131211987654321"

dupes = "19050151158951391658227781234527110196235731198137214733126868520474181772192213718517314542182652441211742304719519143231236593134207203121171237201705111617211824810013324511511436253946122155201534113626129692410611318356178791080921122151321949681166200188841675156120546124912883216212189712281541382202411041372421642917614416870223753814121124318415710310515010682172099012716167102179894920613516297239186222232225635312262134019719915382229399107111802082341491811011604815220291125247641482401691871755205639495788414314011714616366130175601931092467744819271230159131158714761192105218019822421812423322919341426216523821428232"

:: Position :== [Int]
:: Positions :== [Position]
:: Digit :== (Char, Int)
:: Digits :== [Digit]
:: Number :== ([Char], Positions)
:: Numbers :== [Number]
:: Complete :== (Numbers, [Digits])

numbers :: [[Char]]
numbers = [fromString (toString n) \\ n <- [0..(maxNum-1)]]

candidates :: [Char] -> [[Char]]
candidates chars
    = moreCandidates chars []
where
    moreCandidates :: [Char] [[Char]] -> [[Char]]
    moreCandidates [] nums
        = removeDup (filter (\num = isMember num numbers) nums)
    moreCandidates chars []
        = flatten [moreCandidates (removeAt i chars) [[c]] \\ c <- chars & i <- [0..]]
    moreCandidates chars nums
        = flatten [flatten [moreCandidates (removeAt i chars) [ [c : num] \\ num <- nums ]] \\  c <- chars & i <- [0..]]

singletonSieve :: Complete -> Complete
singletonSieve (list, sequence)
    | (list_, sequence_) == (list, sequence)
        = reverseSieve (list, sequence)
    = (list_, sequence_)
where
    singles :: Numbers
    singles 
        = filter (\(_, i) = length i == 1) list
    list_ :: Numbers
    list_
        = [(a, filter (\n = not (isAnyMember n (flatten [flatten b_ \\ (a_, b_) <- singles | a_ <> a]))) b) \\ (a, b) <- list]
    sequence_ :: [Digits]
    sequence_
        = foldr splitSequence sequence (flatten (snd (unzip singles)))

reverseSieve :: Complete -> Complete
reverseSieve (list, sequence)
    # sequence
        = foldr splitSequence sequence (flatten (snd (unzip singles)))
    # list
        = [(a, filter (\n = not (isAnyMember n (flatten [flatten b_ \\ (a_, b_) <- singles | a_ <> a]))) b) \\ (a, b) <- list]
    # list
        = [(a, filter (\n = or [any (isPrefixOf n) (tails subSeq) \\ subSeq <- map (snd o unzip) sequence]) b) \\ (a, b) <- list]
    = (list, sequence)
where
    singles :: Numbers
    singles
        = [(a, i) \\ (a, b) <- list, i <- [[subSeq \\ subSeq <- map (snd o unzip) sequence | isMember subSeq b]] | length i == 1]


splitSequence :: Position [Digits] -> [Digits]
splitSequence split sequence
    = flatten [if(isEmpty b) [a] [a, drop (length split) b] \\ (a, b) <- [span (\(_, i) = not (isMember i split)) subSeq \\ subSeq <- sequence] | [] < max a b]

indexSubSeq :: [Char] Digits -> Positions
indexSubSeq _ []
    = []
indexSubSeq a b
    # remainder
        = indexSubSeq a (tl b)
    | startsWith a b
        = [[i \\ (_, i) <- take (length a) b] : remainder]
    = remainder
where
    startsWith :: [Char] Digits -> Bool
    startsWith _ []
        = False
    startsWith [] _
        = False
    startsWith [a] [(b,_):_]
        = a == b
    startsWith [a:a_] [(b,_):b_]
        | a == b
            = startsWith a_ b_
        = False

missingNumber :: String -> [[Char]]
missingNumber string
    # string
        = [(c, i) \\ c <-: string & i <- [0..]]
    # locations
        = [(number, indexSubSeq number string) \\ number <- numbers]
    # digits
        = [length (indexSubSeq [number] [(c, i) \\ c <- (flatten numbers) & i <- [0..]]) \\ number <-: "0123456789"]
    # missing
        = (flatten o flatten) [repeatn (y - length b) a \\ y <- digits & (a, b) <- locations]
    # (answers, _)
        = hd [e \\ e <- iterate singletonSieve (locations, [string]) | length (filter (\(a, b) = (length b == 0) && (isMember a (candidates missing))) (fst e)) > 0]
    # answers
        = filter (\(_, i) = length i == 0) answers
    = filter ((flip isMember)(candidates missing)) ((fst o unzip) answers)


Start world
    # (args, world)
        = getCommandLine world
    | length args < 2
        = abort "too few arguments\n"
    = flatlines [foldr (\num -> \str = if(isEmpty str) num (num ++ [',' : str]) ) [] (missingNumber arg) \\ arg <- tl args]

Compile with clm -h 1024M -s 16M -nci -dynamics -fusion -t -b -IL Dynamics -IL Platform main

This works by taking every number the string has to contain, and counting the number of places the required digit sequence is present in the string. It then repeatedly does these steps:

  • If number has no possible positions, that's the answer
  • Remove every number with one possible position (call these singles)
  • Remove every position from all remaining numbers which overlaps with any positions from the previously removed numbers (the singles)

Οurous

Posted 2017-12-06T02:52:28.073

Reputation: 7 916

1Running a program with the input hard-coded may be a questionable way to benchmark this: what if the compiler optimizes away the entire computation and writes a binary that merely prints a precomputed result? (I don’t know whether the Clean compiler is quite that smart, but I’ve heard good things about it.) – Anders Kaseorg – 2017-12-10T03:59:53.947

2You... have a very good point. I've checked and it is doing exactly that. I'll amend the answer. – Οurous – 2017-12-10T04:11:10.980

Do you know if it’s possible to get this working on TIO? (My attempt)

– Anders Kaseorg – 2017-12-10T05:04:34.127

1

@AndersKaseorg Not currently, I'm still working with Dennis on getting all the libraries working with TIO. You can find the context starting about here. Basically, at the moment, if it needs more than StdEnv + Dynamics, it won't work.

– Οurous – 2017-12-10T05:54:21.953

Running it locally, I get “multiple solutions” on the given problem 2. (Also, 2 microseconds is a suspicious running time—are you sure you didn’t mean milliseconds? I get about 4 milliseconds per case on my laptop when providing many cases as arguments to a single execution.) – Anders Kaseorg – 2017-12-10T07:00:02.377

@AndersKaseorg fixed – Οurous – 2017-12-10T22:29:07.930

@Ourous What happens if the set of singles in step 2 of your algorithm is empty ? – politza – 2017-12-19T17:16:03.060

@politza It breaks. I didn't run into a case where that happened in about 1000 test runs so I suspect it won't. However, if you find one, tell me what it is so I can fix it. – Οurous – 2017-12-19T20:21:43.557

It does, e.g. in this instance. Alternatively, raise the limit and it should become obvious fairly quickly.

– politza – 2017-12-19T21:03:28.200

@politza Thanks, I'll get right on that :) – Οurous – 2017-12-19T21:06:53.200

I'm curious whether there exists a better algorithm compared to brute-forcing all alternative assignments (of non-singles). – politza – 2017-12-19T21:12:16.017

@politza There is. I was stuck on it until you found a test case that failed this one, but once I fix it I'll be able to finish the performant one and post it to the other question. – Οurous – 2017-12-19T21:16:42.010

@politza Fixed it now. – Οurous – 2017-12-20T00:51:19.057

@Οurous So, how'd you solve it ? – politza – 2017-12-20T13:25:51.097