Sort a list of unknown values with the least amount of comparisons

8

In this optimization challenge you will write a program that sorts a single array by only comparing elements through asking the user on stdout to input comparison results on stdin.

The below protocol is line-based, so everytime you print to stdout or read from stdin it is assumed a newline follows. Throughout the question below it is assumed that the user (read: the scoring program) has an array it wants sorted in a 0-based indexed array called array. For technical reasons I suggest flushing after every print to stout.

  1. As your first step your program must read the array size n from stdin.
  2. Then, as often as you wish, you can print a < b to stdout, with two integers 0 <= a, b < n. Then the user will enter 1 in stdin if array[a] < array[b], and 0 otherwise.
  3. Finally, once your program is convinced it has correctly deduced the order of the array, it must print a ... to stdout, where ... is a space separated list of integers indicating the order. So if your program outputs a 3 0 1 4 2 it means your program has deduced

    array[3] <= array[0] <= array[1] <= array[4] <= array[2]
    

    Note that your program never knew the contents of array, and never will.

You can only ask for < on stdin to sort the array. You can get other comparison operations by these equivalencies:

a > b        b < a
a <= b       !(b < a)
a >= b       !(a < b)
a != b       (a < b) || (b < a)
a == b       !(a < b) && !(b < a)

Finally, since your program interacts with the scoring program on stdout, print debug information to stderr.


Your program will be scored using the following Python program:

from __future__ import print_function
from subprocess import Popen, PIPE
import sys, random

def sort_test(size):
    array = [random.randrange(0, size) for _ in range(size)]
    pipe = Popen(sys.argv[1:], stdin=PIPE, stdout=PIPE, bufsize=0, universal_newlines=True)
    print(str(size), file=pipe.stdin); pipe.stdin.flush()
    num_comparisons = 0
    while True:
        args = pipe.stdout.readline().strip().split()
        if args and args[0] == "a":
            answer_array = [array[int(n)] for n in args[1:]]
            if list(sorted(array)) != answer_array:
                raise RuntimeError("incorrect sort for size {}, array was {}".format(size, array))
            return num_comparisons
        elif len(args) == 3 and args[1] == "<":
            a, b = int(args[0]), int(args[2])
            print(int(array[a] < array[b]), file=pipe.stdin); pipe.stdin.flush()
            num_comparisons += 1
        else:
            raise RuntimeError("unknown command")

random.seed(0)
total = 0
for i in range(101):
    num_comparisons = sort_test(i)
    print(i, num_comparisons)
    total += num_comparisons
print("total", total)

You score your program by typing python score.py yourprogram. The scoring is done by having your program sort one random array of each size 0 to 100 and counting the amount of comparisons your program asks for. These random arrays can have duplicates, your algorithm must be able to handle equal elements. In the event there are duplicates, there are no requirements on the order of the equal elements. So if the array is [0, 0] it doesn't matter if you output a 0 1 or a 1 0.

You may not optimize specifically for the particular arrays the scoring program generates. I may change the RNG seed at any point in time. Answers using built-in sorting algorithms are permitted to be posted for interest's sake, but are non-competing.

Program with lowest score wins.

orlp

Posted 2016-07-20T12:23:00.200

Reputation: 37 067

1Which version of Python are you using? – Leaky Nun – 2016-07-20T14:23:14.330

How long did it take to run your scoring algorithm? – Leaky Nun – 2016-07-20T14:40:17.747

What assumptions can we make on the maximum size of the input array? – helloworld922 – 2016-07-21T00:29:33.207

@helloworld922 Theoretically, none - your answer should work for any size. But if it saves effort in a language like C supporting everything up to and include 100 elements is required. – orlp – 2016-07-21T00:44:04.773

It’s important to clarify the Python version used by the scoring program, because Python 2 and 3 generate different random sequences. Or perhaps it would be better if the randomness came from a well-specified deterministic source such as hashlib. – Anders Kaseorg – 2016-07-21T07:35:07.743

Answers

3

Python, score 23069

This uses the Ford–Johnson merge insertion sorting algorithm.

from __future__ import print_function
import sys

def less(x, y):
    print(x, '<', y)
    sys.stdout.flush()
    return bool(int(input()))

def insert(x, a, key=lambda z: z):
    if not a:
        return [x]
    m = len(a)//2
    if less(key(x), key(a[m])):
        return insert(x, a[:m], key) + a[m:]
    else:
        return a[:m + 1] + insert(x, a[m + 1:], key)

def sort(a, key=lambda z: z):
    if len(a) <= 1:
        return a
    m = len(a)//2
    key1 = lambda z: key(z[-1])
    b = sort([insert(x, [y], key) for x, y in zip(a[:m], a[m:2*m])], key=key1)
    if len(a) % 2:
        b += [[a[-1], None]]
    for k in range(m, len(a)):
        l, i, (x, y) = max((-i.bit_length(), i, t) for i, t in enumerate(b) if len(t) == 2)
        b[:i + 1] = insert([x], b[:i], key=key1) + [[y]]
    if len(a) % 2:
        b.pop()
    return [x for x, in b]

print('a', ' '.join(map(str, sort(range(int(input()))))))

Anders Kaseorg

Posted 2016-07-20T12:23:00.200

Reputation: 29 242

1

Python, 333300 score

from __future__ import print_function
import sys

size = int(input())

def less(a, b):
    print(a, "<", b); sys.stdout.flush()
    return bool(int(input()))

array = list(range(size))
for _ in range(size):
    for i in range(0, size-1):
        if less(array[i+1], array[i]):
            array[i], array[i+1] = array[i+1], array[i]

print("a", " ".join(str(i) for i in array)); sys.stdout.flush()

This is an example program, implementing the simplest form of bubblesort, always doing n*(n-1) comparisons per array.

I scored this program by saving it as sort.py and typing python score.py python sort.py.

orlp

Posted 2016-07-20T12:23:00.200

Reputation: 37 067

If being exact, it's Python 3. I see by print looking like a function. – user48538 – 2016-07-20T12:37:20.627

1@zyabin101 Now it's both :) – orlp – 2016-07-20T12:41:36.730

1

Python 3, 28462 score

Quick sort. Pivot is leftmost item.

The score is expected, because:

\displaystyle\sum_{i\mathop=0}^{100} i\log_2i=29945.648687873225

from __future__ import print_function
import sys
size = int(input())

def less(a, b):
    print(a, "<", b); sys.stdout.flush()
    return bool(int(input()))

def quicksort(array):
    if len(array) < 2:
        return array
    pivot = array[0]
    left = []
    right = []
    for i in range(1,len(array)):
        if less(array[i],pivot):
            left.append(array[i])
        else:
            right.append(array[i])
    return quicksort(left)+[pivot]+quicksort(right)

array = list(range(size))
array = quicksort(array)

print("a", " ".join(str(i) for i in array)); sys.stdout.flush()

Scores for each size tested:

size score
0 0
1 0
2 1
3 3
4 6
5 8
6 11
7 12
8 15
9 17
10 23
11 33
12 29
13 32
14 37
15 45
16 58
17 47
18 52
19 79
20 72
21 60
22 85
23 138
24 87
25 98
26 112
27 107
28 128
29 142
30 137
31 136
32 158
33 143
34 145
35 155
36 169
37 209
38 163
39 171
40 177
41 188
42 167
43 260
44 208
45 210
46 230
47 276
48 278
49 223
50 267
51 247
52 263
53 293
54 300
55 259
56 319
57 308
58 333
59 341
60 306
61 295
62 319
63 346
64 375
65 344
66 346
67 370
68 421
69 507
70 363
71 484
72 491
73 417
74 509
75 495
76 439
77 506
78 484
79 458
80 575
81 505
82 476
83 500
84 535
85 501
86 575
87 547
88 522
89 536
90 543
91 551
92 528
93 647
94 530
95 655
96 580
97 709
98 671
99 594
100 637
total 28462

Leaky Nun

Posted 2016-07-20T12:23:00.200

Reputation: 45 011

@orlp here, the last line translates to "WindowsError: [Error 193] %1 is not a correct Win32 application."

– Leaky Nun – 2016-07-20T15:14:23.310

1

Python 2 (non-competing), 23471 score

from __future__ import print_function
import sys
size = int(input())

def cmp(a, b):
    print(a, "<", b); sys.stdout.flush()
    return [1,-1][bool(int(input()))]

array = list(range(size))
array = sorted(array,cmp=cmp)

print("a", " ".join(str(i) for i in array)); sys.stdout.flush()

Scores for each size tested:

size score
0 0
1 0
2 1
3 4
4 5
5 7
6 9
7 14
8 17
9 20
10 21
11 26
12 30
13 35
14 37
15 41
16 45
17 51
18 52
19 57
20 63
21 63
22 72
23 79
24 79
25 80
26 91
27 93
28 96
29 105
30 110
31 116
32 124
33 123
34 131
35 130
36 142
37 144
38 156
39 154
40 163
41 168
42 177
43 178
44 183
45 183
46 192
47 194
48 212
49 207
50 216
51 221
52 227
53 239
54 238
55 243
56 255
57 257
58 260
59 270
60 281
61 284
62 292
63 293
64 303
65 308
66 312
67 321
68 328
69 328
70 342
71 348
72 352
73 358
74 367
75 371
76 381
77 375
78 387
79 400
80 398
81 413
82 415
83 427
84 420
85 435
86 438
87 448
88 454
89 462
90 462
91 479
92 482
93 495
94 494
95 502
96 506
97 520
98 521
99 524
100 539
total 23471

Leaky Nun

Posted 2016-07-20T12:23:00.200

Reputation: 45 011

I think Python already implements a very good sorting function, looking at the optimal number of comparisons here: https://en.wikipedia.org/wiki/Comparison_sort#Number_of_comparisons_required_to_sort_a_list Looks like this will be a reference for a very very good baseline.

– justhalf – 2016-07-21T04:10:32.770