The Missing Number - Version 2

7

1

Background:

This question is a remix of the one that I made previously on this forum. The only difference with this one is: the range is significantly larger, AND dynamic. Details below!

Also, I'm typing this question incredibly quickly, so if there are any grammatical errors, I do apologize and ask if anyone would edit the question to clear any vagueness. Nonetheless, the original question is posted here: The Missing Number Revised

Question:

Given a minimum bound M, a maximum bound N, and a String of randomly concatenated integers [min, max), min (M) to max (N) exclusive, there is a number missing in the sequence. Your job is to write a program that will calculate this missing number.

Doing this problem by hand on smaller Strings, such as examples 1 and 2 below are obviously very easy. Conversely, computing a missing number on incredibly large datasets involving three-digit, four-digit, or more numbers would be incredibly difficult. The idea behind this problem is to construct a program that will do this process FOR you.

Important Information:

One thing that appeared as rather confusing when I posted this problem last night was: what exactly a missing number is defined as. A missing number is the number INSIDE of the range specified above; NOT necessarily the digit. In example 3, you will see that the missing number is 7, even though it appears in the sequence. There are 3 places the DIGIT 7 will appear in a series of [0, 30): “7”, “17”, and “27”. Your objective is to differentiate between these, and discover that 7 is the missing NUMBER (inside of example 3). In other words, the tricky part lies in finding out which sequences of digits are complete and which belong to other numbers.

Another very important factor in this: there are some input situations where there are multiple solutions! Your job will be to print out ALL numbers that are missing in the sequence. I cannot provide insight as to WHY they have multiple solutions; I only know they may exist.

Input:

The input is a single Integer T: the amount of test cases.

For each test case, there is an Integer M: the minimum number in the sequence, another Integer N: the maximum number in the sequence (thus M < N at ALL times), a String S, containing integers from M to N - 1 inclusive, or M to N exclusive (in other words, [M, N)). These integers, as stated above, are scrambled up to create a random sequence. There are NO delimiters (“42, 31, 23, 44”), or padding 0’s (003076244029002); the problems are exactly as described in the examples.

As stated above, M is ALWAYS lower than N, and the maximum range of numbers is < 100,000. |N - M|.

Winning Criteria:

Whoever has the fastest, and lowest memory usage will be the winner. In the miraculous event that a time ties, lower memory will be used for the time breaker. Please list Big O if you can!

Examples:

(Examples 1-6 are just for demonstration. Your program needn't to handle them.)

Here is a pastebin link of the examples.

TEST DATA

Please keep in mind the way you're to read in the data: Digit D represents how MANY test cases there are, the next lines will follow with integers M and N, then String S. I just wanted to clear it up. I also apologize for the incredibly large pastebin. Good luck!

Data: Test Data - Github repository

TheProgrammer

Posted 2017-12-11T02:32:32.347

Reputation: 395

Do we need to take input from a file, or is STDIN acceptable? – Οurous – 2017-12-11T03:36:04.410

@Ourous I figured it would be easier to read in the test data from a file, but if you want to do standard input, it’s acceptable. – TheProgrammer – 2017-12-11T04:19:59.003

3I was looking at example 108-7-13129514-120-11-14134, and it seems to be missing a lot more than -15... Am I am misinterpreting the problem? – Neil – 2017-12-11T05:23:34.763

2You can just use the sandbox... – user202729 – 2017-12-11T05:40:38.137

With negative numbers the task is quite different from previous challenge ... but without the '-' symbol is would be more difficult – edc65 – 2017-12-11T14:49:49.127

Input format is overly specific. Why not just take M, N and S, like the previous challenge (and most of the other challenges nowadays)? – Uriel – 2017-12-11T19:14:16.563

@Uriel You can do that; I was wanting to be as specific as possible. – TheProgrammer – 2017-12-12T05:49:32.930

Answers

2

Clean - Complexity Unknown

Currently works, explanation and further optimization in progress

module main
import StdEnv, StdLib, System.IO

:: Position :== [Int]
:: Positions :== [Position]
:: Digit :== (Char, Int)
:: Digits :== [Digit]
:: Number :== ([Char], Positions)
:: Numbers :== [Number]
:: Complete :== (Numbers, Positions)

decouple =: snd o unzip

//lcouple :: (.a -> .b) (.a, .c) -> (.b, .c)
lcouple fn (arg, oth) :== (fn arg, oth)

getCases :: Int *World -> ([(String, [[Char]])], *World)
getCases n world
    # (range, world)
        = evalIO getLine world
    # range
        = fromString range
    # spacePos
        = elemIndex ' ' range
    | isNothing spacePos
        = abort "invalid input, expected space in range\n"
    # (a, b)
        = splitAt (fromJust spacePos) range
    # (a, b)
        = (toInt (toString a), toInt (toString (tl b)))
    # numbers
        = [(fromString o toString) number \\ number <- [a..(b-1)]]
    # (string, world)
        = evalIO getLine world
    | n > 1
        # (cases, world)
            = getCases (n - 1) world
        = ([(string, numbers) : cases], world)
    = ([(string, numbers)], world)

singletonSieve :: Complete -> Complete
singletonSieve (list, sequence)
    | sequence_ == sequence
        = reverseSieve (list, sequence)
    = (list_, sequence_)
where
    singles :: Positions
    singles 
        = [hd pos \\ (_, pos) <- list | length pos == 1]
    list_ :: Numbers
    list_
        = map (app2 (id, filter notInOtherSingle)) list
    where
        notInOtherSingle :: Position -> Bool
        notInOtherSingle pos
            = not (isAnyMember pos (flatten (filter ((<>) pos) singles)))
    sequence_ :: Positions
    sequence_
        = foldr splitSequence sequence singles

reverseSieve :: Complete -> Complete
reverseSieve (list, sequence)
    | sequence_ == sequence
        = (list, sequence)
    = (list_, sequence_)
where
    singles :: Positions
    singles
        = [hd pos \\ pos <- [[subSeq \\ subSeq <- sequence | isMember subSeq p] \\ (_, p) <- list] | length pos == 1]
        //= [hd pos \\ pos <- | length pos == 1]
    list_ :: Numbers
    list_
        = map (app2 (id, filter (\b_ = (notInOtherSingle b_) && (hasContiguousRun b_)))) list
    where
        notInOtherSingle :: Position -> Bool
        notInOtherSingle pos
            = not (isAnyMember pos (flatten (filter ((<>) pos) singles)))
        hasContiguousRun :: Position -> Bool
        hasContiguousRun pos
            //= any (any (isPrefixOf pos) o tails) sequence_
            = and [isMember p (flatten sequence_) \\ p <- pos]

    sequence_ :: Positions
    sequence_
        = foldr splitSequence sequence singles

splitSequence :: Position Positions -> Positions
splitSequence split sequence
    = flatten (map newSplit (map (span (not o ((flip isMember) split))) sequence))
where
    newSplit :: (Position, Position) -> Positions
    newSplit ([], b)
        # b
            = drop (length split) b
        | b > []
            = [b]
        = []
    newSplit (a, b)
        # b
            = drop (length split) b
        | b > []
            = [a, b]
        = [a]

indexSubSeq :: [Char] Digits -> Positions
indexSubSeq _ []
    = []
indexSubSeq a b
    # remainder
        = indexSubSeq a (tl b)
    | isPrefixOf a (map fst b)
        = [[i \\ (_, i) <- take (length a) b] : remainder]
        //= [[i \\ _ <- a & (_, i) <- b] : remainder]
    = remainder

missingNumber :: String [[Char]] -> [[Char]]
missingNumber string numbers
    # string
        = [(c, i) \\ c <-: string & i <- [0..]]
    # locations
        = [(number, indexSubSeq number string) \\ number <- numbers]
    # digits
        = [length (indexSubSeq [digit] [(c, i) \\ c <- (flatten numbers) & i <- [0..]]) \\ digit <-: "0123456789-"]
    # missing
        = flatten [repeatn (n - length i) c \\ n <- digits & (c, i) <- [(digit, indexSubSeq [digit] string) \\ digit <-: "0123456789-"]]
    # (answers, _)
        = until (\e = e == singletonSieve e || length [(a, b) \\ (a, b) <- fst e | length b == 0 && isMember a (candidates missing)] > 0) singletonSieve (locations, [indexList string])
    # answers
        = filter (\(_, i) = length i == 0) answers
    = filter ((flip isMember)(candidates missing)) ((fst o unzip) answers)
where
    candidates :: [Char] -> [[Char]]
    candidates chars
        = moreCandidates chars []
    where
        moreCandidates :: [Char] [[Char]] -> [[Char]]
        moreCandidates [] nums
            = removeDup (filter (\num = isMember num numbers) nums)
        moreCandidates chars []
            = flatten [moreCandidates (removeAt i chars) [[c]] \\ c <- chars & i <- [0..]]
        moreCandidates chars nums
            = flatten [flatten [moreCandidates (removeAt i chars) [ [c : num] \\ num <- nums ]] \\  c <- chars & i <- [0..]]

Start world
    # (number, world)
        = evalIO getLine world
    # (cases, world)
        = getCases (toInt number) world
    =  flatlines[flatten(intersperse [', '] (missingNumber string numbers)) \\ (string, numbers) <- cases]

Try it online!

Οurous

Posted 2017-12-11T02:32:32.347

Reputation: 7 916

0

C++: A naive brute-force solution

Try to split the string in any possible way where all number are within the range and do not repeat. If there is only one number missing, add it to the set of answers.

While I tried to optimise the code, I did not put any heuristics or optimisations to the algorithm. The solution is so slow, that it can only handle the 1st and the 3rd test case. It can be used as a baseline when comparing "speed" and to show how a better algorithm implemented in a slower language can beat a worse solution in a faster language.

#include <iostream>
#include <vector>
#include <algorithm>
#include <set>

class missing_number_solver
{
	static constexpr int max_digit_len = 6;

	long int n, m;
	std::vector<long int> digits;

	std::vector<long int> used;
	long int used_count = 0;

	std::set<long int> results;

	void f(size_t pos)
	{
		if (pos == digits.size())
		{
			if (used_count == m - n - 1)
			{
				auto idx = std::find(used.begin(), used.end(), false) - used.begin();
				results.insert(n + idx);
			}
			return;
		}

		bool negate = false;
		if (digits[pos] < 0)
		{
			negate = true;
			pos++;
		}

		long int acc = 0;
		for (int i = 0; i < max_digit_len && pos + i < digits.size(); i++)
		{
			if (digits[pos + i] < 0)
			{
				// Minus sign. Stop.
				break;
			}

			acc = 10 * acc + digits[pos + i];
			auto number = negate ? -acc : acc;

			if (number < n || number >= m)
			{
				continue;
			}
			if (used[number - n])
			{
				continue;
			}

			used[number - n] = true;
			used_count++;

			f(pos + i + 1);

			used[number - n] = false;
			used_count--;

			if (i == 0 && digits[pos] == 0)
			{
				// Leading zero. Do not consume more digits.
				break;
			}
		}
	}

public:
	missing_number_solver(long int n, long int m, std::string str) :
			n(n), m(m)
	{
		for (char c : str)
		{
			digits.push_back(c == '-' ? -1 : c - '0');
		}
	}

	std::set<long int> solve()
	{
		used = std::vector<long int>(m - n, false);
		f(0);
		return results;
	}
};

int main()
{
	int t;

	std::cin >> t;

	for (int i = 0; i < t; i++)
	{
		long int n, m;
		std::string s;

		std::cin >> n >> m;
		std::cin >> s;

		auto results = missing_number_solver(n, m, s).solve();

		for (auto result : results)
		{
			std::cout << result << ' ';
		}
		std::cout << std::endl;
	}
}

Try it online!

In addition, this is how a test case with multiple solutions looks like:

1 22
1212345678910111314151617181920

As I understand, the output should be:

12 21

as the original sequence must be one of:

[1, 2, 12, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20] # missing 21
[1, 21, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20] # missing 12
[12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, 19, 20] # missing 21

user28667

Posted 2017-12-11T02:32:32.347

Reputation: 579

0

JavaScript (ES6)

Quite slow, but I think it can solve the test cases. Not all test cases present at the moment, I'll try to put all of them later.

Time=fun=>{
  var t0 = performance.now(), t1=t0, t2
  console.log(fun(msg=>console.log(msg, (t2=t1,t1=performance.now(),t1-t2))))
  var t = performance.now()-t0
  return t<1e3 ? t+' ms':t/1000+' sec'
}

function Missing(str, ll,hl)
{
  var can, seq, num, i1, i2, s1, s2

  var rec = span => 
  {
    var num
    if (!seq[span]) return 1
    if (seq[span] == '_') ++span
    if (seq[span] == '-')
    {
      ++span
      num = -seq[span++]
      while (num < hl && num >= ll) 
      {
        if (!values[num])
        {
          values[num] = 1
          if (rec(span)) return 1
          values[num] = 0
        }
        if (!num) return;
        if (seq[span] == '_' || seq[span] == '-') return;
        num = -seq[span++] +num*10
      }    
    }
    else
    {
      num = +seq[span++]

      while (num < hl && num >= ll) 
      {
        if (!values[num])
        {
          values[num] = 1
          if (rec(span)) return 1
          values[num] = 0
        }
        if (!num) return;
        if (seq[span] == '_' || seq[span] == '-') return;
        num = +seq[span++] +num*10
      }    
    }
  }

  find=num=>{
    var i1,i2
    if (num < 0)
    {
      i1 = seq.indexOf(num)
      if (i1<0) return i1
      i2 = seq.lastIndexOf(num)
      return i2 == i1 ? i1 : -1;
    }
    else
    {
      // take care of false positives when there is a '-' before the number
      // so much slower respect to the other branch
      var rx = new RegExp('[^-]'+num)
      i1 = (' '+seq).search(rx)
      if (i1 < 0) return i1
      i2 = seq.slice(i1).search(rx)
      return i2 < 0 ? i1 : -1;
    }
  }


  for (can = ll; can < hl; ++can)
  {
    if (str.indexOf(can) < 0) return can;
    values = {}
    values[can] = 1
    seq = str
    for (num = ll; num < hl; ++num)
    {
      if (!values[num])
      {
        i1 = find(num)
        if (i1 >= 0)
        {
          i2 = i1+(''+num).length
          if (seq[i1-1] == '_') --i1;
          if (seq[i2] == '_') ++i2;
          s1 = seq.slice(0,i1), s2 = seq.slice(i2)
          seq = s1 ? s2 ? s1+'_'+s2 : s1 : s2
          values[num] = 1
          num = ll-1
        }
      }
    }
    // console.log('REC',can,seq)
    if (rec(0)) return can
  }
  
}

Test = [["229272120623131992528240518810426223161211471711",0,30],
["11395591741893085201244471432361149120556162127165124233106210135320813701207315110246262072142253419410247129611737243218190203156364518617019864222241772384813041175126193134141008211877147192451101968789181153241861671712710899168232150138131195104411520078178584419739178522066640145139388863199146248518022492149187962968112157173132551631441367921221229161208324623423922615218321511111211121975723721911614865611197515810239015418422813742128176166949324015823124214033541416719143625021276351260183210916421672722015510117218224913320919223553222021036912321791591225112512304920418584216981883128105227213107223142169741601798025",0,250],
["1342319526198176611201701741948297621621214122224383105148103846820718319098731271611601912137231471099223812820157162671720663139410066179891663131117186249133125172622813593129302325881203242806043154161082051916986441859042111711241041590221248711516546521992257224020174102234138991752117924457143653945184113781031116471120421331506424717816813220023315511422019520918114070163152106248236222396919620277541101222101232171732231122301511263822375920856142187182152451585137352921848164219492411071228936130762461191564196185114910118922611881888513917712153146227193235347537229322521516718014542248813617191531972142714505519240144",0,250],
["80901062173445953196186301456692487427529429993285787069831233591227347952858217788973393434655233831877642815971961376506739541125150847218402660459824663618793744638964202557497582411568",0,100],
["1302797101187197126231292239133232171515415180252611104191857114450179111134201662662932221381232833721819553193151235255288382951611921896841051718124164119345781124629128526219276169113296142414667176120182159236229724224516532275167119911243329243153024688191841282382655120729228164986223079108216327276103148145142108302692092566328429021116692173174253143279251205351277329821214010060691392141981836583611522942572022622786840551472342675428024419426414125827112277297170168186160110274722278115694901142861021820190220247252137158289448121233277249250155149270135481781852131754322187701628221622313210925442210224106118208268228260259299563117313621524812441579558157591322615020323724028718891772125199273808916247116862811074974204225999613626323152211312397519621734172206200",0,300],
["201-1382843613319717171212-193-175-591897023633-61234-11828-6108969023295-104-18916914-36-893-23126175-85-9611811654113-158-198208-250126-95143166-65160215-70-44-202-194158-16324475-2011514516537-90-358511742-211-97-139162219-29221-18135-12-60-22516-222-312288-217-102137-2092095274-190-87249-105-249157-6916430-116-15360225100-178-218-129-141248-111-224220224159237-128-137-79-38-18798-51-238-173-203109532429200223625180188-226112-142216-242150-1231038-228-887379154-210101130-5323115199-15623180-20710425-245-186-147102-247240233194168-112228148592144964-149-100-52193129152-34-1134043128123-172110-6363144-21922746-151-174-140-19299-206161-12268141-214-805067206-46-5195-237-5813-40-130-170122-229-16558213-184-47174-161190-13487115-115-23-235-143-201078382222-57-67205-18-81-427278-179-11-162-244-86-83-8421-11927-120204-71-50138-136-1991359165-126173-76448-62206229-99179-164-110-220-146-17697-22376146-49184-236-37-91-43-419671127-54-23269-480-451313817023821721015322617613-3224286-182186-144-233-216-24016313977-98-230192155-101182183-14889-19714018581-166-114-1502189-27-212-17-92121-1-23918732-132-215-205-21-33-15961-1275-15-196-39-56-154-243-6820256-89191239145-191178-55-208-9347-16-41-221-152-2839-18310-241-117-169-75-103-73-16716744241-248125-64119-30120-94-177-234-195-13-21324734198-14571569255-26149-107-13111-121-125-171-77-204-157-25-168-66127-19-200172-78105147-7213212466177-155-135-185-24-109181230-227-7424619-106142111-246-22074111494-7-160-9106-10-133-18024531-145211-124-3-188-82136-22235-10818203134",-250,250],
["-615-631-765-659-699-748-714-554-451-569-898-462-676-721-459-578-522-493-668-686-728-603-455-587-815-537-492-639-842-485-747-832-768-899-471-900-893-598-626-757-682-501-620-614-549-770-824-527-539-755-596-452-821-591-572-857-895-776-836-503-773-590-575-568-633-737-702-514-510-724-592-739-790-846-732-536-579-775-479-688-687-740-794-650-637-636-518-791-611-473-497-675-758-697-548-655-559-534-736-677-860-543-733-723-805-831-670-608-883-502-600-583-467-867-613-612-621-861-731-866-793-460-798-505-595-771-769-486-718-570-813-749-685-496-630-819-886-785-870-807-516-744-680-653-719-859-717-681-880-476-674-582-667-489-865-876-695-495-703-841-707-711-777-760-848-837-517-532-784-465-693-894-520-678-862-818-487-743-782-597-477-562-519-586-839-700-538-710-566-746-565-546-466-456-811-599-822-772-602-480-810-494-751-550-640-665-508-809-581-840-560-863-526-454-725-683-632-660-729-594-457-753-759-816-609-706-800-891-877-617-628-715-651-833-547-558-817-734-698-735-647-762-589-855-541-774-792-875-838-574-652-864-482-673-619-799-780-851-783-666-644-742-812-872-641-622-662-535-814-803-856-720-722-498-826-500-524-858-897-512-490-610-808-529-694-528-506-804-701-761-561-679-634-834-738-545-712-605-623-878-588-882-513-461-828-624-509-854-616-823-458-472-705-488-767-556-523-533-779-829-654-849-464-830-499-763-504-585-580-852-730-474-657-557-692-727-801-825-844-542-661-892-820-656-704-835-563-672-690-709-750-689-766-604-553-551-868-577-726-478-663-642-752-463-745-716-483-481-696-671-869-625-618-564-881-648-573-691-606-786-843-788-827-797-888-795-874-584-796-787-778-806-847-593-507-764-879-635-889-521-887-511-713-884-754-873-470-601-646-552-530-845-525-567-627-491-629-669-741-469-484-571-853-850-453-576-708-871-789-885-658-664-643-802-515-684-540-555-896-756-607-638-645-531-468-475-649-781-890",-900, -450],
["183205152228238-164156229248-187-189-207-112-128134-245231568-246-4760213-7266-1961240-141199233138-226-12421763-159125-208239-195225146-51-45105-24794-126-165-37-1456-40143109-75-222-116-70165110-214-11151-127-17411923779-864-720-2031573215875-65190-27187191878218613723286-218176-34-90-123182-153-38218-7618-173-205227198-25010710624430148-25-198-14824778-239-215597-21749206-121-1681627711456-35-101-18201-235-88-19124171200-194-78-125-138127216168-248174128153181-41299117811-146-171212850-20615021-1301189-69-152-117-23422367-43-6823-23273140-231195-220-44-197-211-64219121185-134-96-2217076-58-107-80222117-14-119-111111-102-7735197169-99-1324817326-163-223113-240-190-7917513957-154-103-66101126-298149145-1764351-28-2291372-135-202-118-175188-42-161975216153-109210172-862202653992132-85-1-1391099-738962-56103-180-97154-131-249-105-93-53-5550-89160-160204-104-114-3946-6374-170-17-18520398-59-36-177-94-238-2345-219177-243-15755-82-20115-149108-200-31-71135220234142844104-1934061-32236-166-6081-137-156180-1675436-182-9270179192-221-20180-120124-2372237-8716243-147-1692731-21-224-83-136-67-199214-49-241235224-212-233-16-210226167195833171-84245-11341-614488-19213014-10619612316312-74211-129133-10813169-155-41-144-133-183-98193-213-57958325-12164-24-230-54-50-142-244-30141249-122-2155-204215-942152990-91194-95100-11017-5-1812893-186-1396-242-140-179-188-236-151-228-216-2093-3147-11511224-48-62207-162-15024634-26-15813647120116208-100-19209242221-178-15-184-33-81159230-172-227-52-61184-225-461024418938122166-143",-250,250],
]

function go() {
  var t, i = 0;
  var Solve = _ => {
    var [s,a,b] = Test[i++]
    console.log(a,b,s)
    setTimeout(_=> {
      t = Time(_ => Missing(s,a,b));
      console.log('Solved in',t);
      if (Test[i]) setTimeout(Solve, 100);
    }, 100)
  }
  setTimeout(Solve, 100)
}
<button onclick="go()" >Test</button>Expect a total time of less than 3 minutes for all test cases in chrome

edc65

Posted 2017-12-11T02:32:32.347

Reputation: 31 086