Sequence avoiding near collision

1

Task

Find a sequence of all numbers between min and max where every number differs from every other number in the sequence by at least "d" digits.

Example of sub-sequence

For min = 0, max = 1000 and d = 2, the following is part of one possible solution sequence:

  • 123 ← Good
  • 132 ← Good
  • 145 ← Good
  • 146 ← Bad

Notes

This is a bit practical. I was thinking about this as a problem that has kept cropping up over the years. Basically it's a pattern in URL shorteners / prize codes / conference room codes / etc...

I'd like to be able to hand out codes from a list, and have each code be: A - Simple to type, and B - Be somewhat unlikely to accidentally conflict with another code because of a typing mistake.

There's no reason to work in base ten. I've expressed the problem that way for clarity, but in real cases we'd take advantage of letters as well.

Sub-Note

This is not that practical. If you use just six "digits" (allowing numbers and capital letters), then you have over two billion choices. That seems like plenty for most cases except for the URL shortener. Still I was thinking it would be nice to be able to generate a random number from that space and be able to quickly check that it was at least "d" different from all others that had been handed out up to that point. Generating a solution sequence is not quite the same problem, but it seemed interesting and I'm hoping that someone comes up with a clever way to structure the data in order to make it efficient to search for these near collisions.

Josh Bannon

Posted 2016-07-03T17:51:14.057

Reputation: 11

Question was closed 2016-07-03T19:35:51.360

Why is a minimun number needed, why not just 0 to max? – Blue – 2016-07-03T18:03:18.163

how is 145 good? – Optimizer – 2016-07-03T18:04:59.767

Also its not clear what the input and output of this challenge are. Moreover, please explain more regarding fastest code part as you have marked this question as fastest code but not provided any info around that in the question. – Optimizer – 2016-07-03T18:07:00.730

Oh, I didn't realise it wasn't a code-golf - fastest algorithm should be the first tag btw – Blue – 2016-07-03T18:08:02.970

@muddyfish - You could always start with zero. Starting with a different min gives a convenient way to generate a different sequence. I've reordered the tags as you suggested. – Josh Bannon – 2016-07-03T18:15:27.553

@Optimizer - Can you explain the confusion with the example? 145 differs by at least two digits from each previously given value. 146 has 1 and 4 in common in the same position as 145, though so it would fail the test.

I marked question as "fastest-algorithm" hoping to avoid the typical "fastest-code" issues. I want to avoid bench-marking in a half dozen different languages. I'm looking for an innovative search method or a way to generate a good sequence without having to search. I may post a naive example to try to clarify any confusion with the examples. – Josh Bannon – 2016-07-03T18:23:10.170

oh. I was thinking more in terms of each digit of 146 having a max diff of d. There are already 2 close votes on the question, proving that more clarity is needed. – Optimizer – 2016-07-03T19:13:33.710

As this is currently written, nothing stops the program that always outputs the empty sequence from being a valid answer. – Anders Kaseorg – 2016-07-03T19:21:21.970

1

What is the scope of "all" in "a sequence of all numbers between min and max"? The obvious reading is that you're looking for a permutation, but further reading strongly suggests that you're actually after a subset. Do you want a largest possible subset which meets the criterion? But then [tag:fastest-algorithm] is unlikely to be a good discriminator, because the size of the output is probably exponential in the size of the input. And although I'm not a coding theorist, I suspect that the only known approach is brute force.

– Peter Taylor – 2016-07-03T20:07:06.357

Answers

0

Here's a terrible, brute force way of generating a sequence. I haven't attempted to do much to make it fast, though there are probably several simple gains to be had here. I'm trying to think of a way to make this at least a couple of orders of magnitude faster rather than finding improvements of a few percent.

public static IEnumerable<int> GetDifferentCodes(int min, int digits, int minDifference)
{
    var max = Math.Pow(10, digits) - 1;
    var goodCodes = new List<int>();
    var goodCodeDigits = new List<short[]>();

    for (var i = min; i < max; i++)
    {
        var newDigits = GetDigits(i, digits);

        if (!goodCodeDigits.All(goodCode => HasMinDiff(goodCode, newDigits, digits, minDifference)))
            continue;
        goodCodes.Add(i);
        goodCodeDigits.Add(newDigits);
    }
    return goodCodes;
}

private static readonly int[] Powers = {1, 10, 100, 1000, 10000, 100000, 1000000, 10000000};

private static short[] GetDigits(int num, int digits)
{
    var ret = new short[digits];
    for (var i = 0; i < digits; i++)
    {
        ret[i] = (short) ((num/Powers[i]) % 10);
    }
    return ret;
}

private static bool HasMinDiff(short[] goodDigits, short[] digitsToCheck, int digits, int minDiff)
{
    var difference = 0;
    for (var i = 0; i < digits; i++)
    {
        if (goodDigits[i] == digitsToCheck[i]) continue;
        if (++difference >= minDiff) return true;
    }
    return false;
}

Josh Bannon

Posted 2016-07-03T17:51:14.057

Reputation: 11

This questions is now closed over a rule that doesn't actually exist so far as I can see. This is pretty disappointing. In any case I found a not-quite-so-brute-force method if anyone else is thinking about this issue in the future: It's possible to calculate all collisions for the code you're checking, and search only for those collisions. This makes the algorithmic complexity linear. It seems obvious after I though of it, and I don't know why it gave me so much trouble earlier. – Josh Bannon – 2016-07-04T20:35:22.000