Underhanded code contest: Not-so-quick sort

28

8

The Task

Write a program, in the language of your choice, that reads lines of input from standard input until EOF, and then writes them to standard output in ASCIIbetical order, similar to the sort command-line program. A short, non-underhanded example in Python is:

import sys

for line in sorted(sys.stdin):
    print(line.rstrip('\n'))

The underhanded part

Similar to The OS War, your goal is to prove that your favored platform is “better”, by having your program deliberately run much more slowly on a competing platform. For the sake of this contest, a “platform” consists of any combination of:

  • Processor
    • Architecture (x86, Alpha, ARM, MIPS, PowerPC, etc.)
    • Bitness (64-bit vs. 32-bit vs. 16-bit)
    • Big- versus little-endian
  • Operating System
    • Windows vs. Linux vs. Mac OS, etc.
    • Different versions of the same operating system
  • Language implementation
    • Different compiler/interpreter vendors (e.g., MSVC++ vs. GCC)
    • Different versions of the same compiler/interpreter

Although you could meet the requirements by writing code like:

#ifndef _WIN32
    Sleep(1000);
#endif

Such an answer should not be upvoted. The goal is to be subtle. Ideally, your code should look like it's not platform-dependent at all. If you do have any #ifdef statements (or conditions based on os.name or System.Environment.OSVersion or whatever), they should have a plausible justification (based on a lie, of course).

Include in your answer

  • The code
  • Your “favorite” and “unfavorite” platforms.
  • An input with which to test your program.
  • The running time on each platform, for the same input.
  • A description of why the program runs so slowly on the unfavorite platform.

dan04

Posted 2014-01-29T07:37:19.360

Reputation: 6 319

Question was closed 2016-04-18T15:07:02.863

2

I'm voting to close this question as off-topic because underhanded challenges are no longer on-topic on this site. http://meta.codegolf.stackexchange.com/a/8326/20469

– cat – 2016-04-18T12:43:20.357

4This is harder than I thought. The only answers I can come up with are either very long and a bit obvious, or very short and extremely obvious :-( – r3mainer – 2014-01-30T16:16:34.763

Answers

29

C

CleverSort

CleverSort is a state-of-the-art (i. e. over-engineered and sub-optimal) two-step string sorting algorithm.

In step 1, it starts by pre-sorting the input lines using radix sort and the first two bytes of each line. Radix sort is non-comparative and works very well for strings.

In step 2, it uses insertion sort on the pre-sorted list of strings. Since the list is almost sorted after step 1, insertion sort is quite efficient for this task.

Code

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// Convert first two bytes of Nth line into integer

#define FIRSTSHORT(N) *((uint16_t *) input[N])

int main()
{
    char **input = 0, **output, *ptemp;
    int first_index[65536], i, j, lines = 0, occurrences[65536];
    size_t temp;

    // Read lines from STDIN

    while(1)
    {
        if(lines % 1000 == 0)
            input = realloc(input, 1000 * (lines / 1000 + 1) * sizeof(char*));

        if(getline(&input[lines], &temp, stdin) != -1)
            lines++;
        else
            break;
    }

    output = malloc(lines * sizeof(char*));

    // Radix sort

    memset(occurrences, 0, 65536 * sizeof(int));

    for(i = 0; i < lines; i++) occurrences[FIRSTSHORT(i)]++;

    first_index[0] = 0;

    for(i = 0; i < 65536 - 1; i++)
        first_index[i + 1] = first_index[i] + occurrences[i];

    memset(occurrences, 0, 65536 * sizeof(int));

    for(i = 0; i < lines; i++)
    {
        temp = FIRSTSHORT(i), output[first_index[temp] + occurrences[temp]++] = input[i];
    }

    // Insertion sort

    for(i = 1; i < lines; i++)
    {
        j = i;

        while(j > 0 && strcmp(output[j - 1], output[j]) > 0)
            ptemp = output[j - 1], output[j - 1] = output[j], output[j] = ptemp, j--;
    }

    // Write sorted lines to STDOUT

    for(i = 0; i < lines; i++)
        printf("%s", output[i]);
}

Platforms

We all know that big-endian machines are much more efficient than their little-endian counterparts. For benchmarking, we'll compile CleverSort with optimizations turned on and create randomly a huge list (just over 100,000 strings) of 4-byte lines:

$ gcc -o cleversort -Ofast cleversort.c
$ head -c 300000 /dev/zero | openssl enc -aes-256-cbc -k '' | base64 -w 4 > input
$ wc -l input
100011 input

Big-endian benchmark

$ time ./cleversort < input > /dev/null

real    0m0.185s
user    0m0.181s
sys     0m0.003s

Not too shabby.

Little-endian bechmark

$ time ./cleversort < input > /dev/null

real    0m27.598s
user    0m27.559s
sys     0m0.003s

Boo, little Endian! Boo!

Description

Insertion sort is really rather efficient for almost-sorted lists, but it is horribly inefficient for randomly sorted ones.

CleverSort's underhanded part is the FIRSTSHORT macro:

#define FIRSTSHORT(N) *((uint16_t *) input[N])

On big-endian machines, ordering a string of two 8-bit integers lexicographically or converting them to 16-bit integers and ordering them afterwards yields the same results.

Naturally, this is possible on little-endian machines as well, but the macro should have been

#define FIRSTSHORT(N) (input[N][0] | (input[N][1] >> 8))

which works as expected on all platforms.

The "big-endian benchmark" above is actually the result of using the proper macro.

With the wrong macro and a little-endian machine, the list is pre-sorted by the second character of every line, resulting in a random ordering from the lexicographical point of view. Insertion sort behaves very poorly in this case.

Dennis

Posted 2014-01-29T07:37:19.360

Reputation: 196 637

16

Python 2 vs. Python 3

Obviously, Python 3 is several orders of magnitude faster than Python 2. Let's take this implementation of the Shellsort algorithm as an example:

Code

import sys
from math import log

def shellsort(lst):

    ciura_sequence = [1, 4, 10, 23, 57, 132, 301, 701]  # best known gap sequence (Ciura, 2001)

    # check if we have to extend the sequence using the formula h_k = int(2.25h_k-1)
    max_sequence_element = 1/2*len(lst)
    if ciura_sequence[-1] <= max_sequence_element:
        n_additional_elements = int((log(max_sequence_element)-log(701)) / log(2.25))
        ciura_sequence += [int(701*2.25**k) for k in range(1,n_additional_elements+1)]
    else:
        # shorten the sequence if necessary
        while ciura_sequence[-1] >= max_sequence_element and len(ciura_sequence)>1:
            ciura_sequence.pop()

    # reverse the sequence so we start sorting using the largest gap
    ciura_sequence.reverse()

    # shellsort from http://sortvis.org/algorithms/shellsort.html
    for h in ciura_sequence:
        for j in range(h, len(lst)):
            i = j - h
            r = lst[j]
            flag = 0
            while i > -1:
                if r < lst[i]:
                    flag = 1
                    lst[i+h], lst[i] = lst[i], lst[i+h]
                    i -= h
                else:
                    break
            lst[i+h] = r

    return lst

# read from stdin, sort and print
input_list = [line.strip() for line in sys.stdin]
for line in shellsort(input_list):
    print(line)

assert(input_list==sorted(input_list))

Benchmark

Prepare a test input. This is taken from Dennis answer but with fewer words - Python 2 is sooo slow...

$ head -c 100000 /dev/zero | openssl enc -aes-256-cbc -k '' | base64 -w 4 > input

Python 2

$ time python2 underhanded2.py < input > /dev/null 

real    1m55.267s
user    1m55.020s
sys     0m0.284s

Python 3

$ time python3 underhanded2.py < input > /dev/null 

real    0m0.426s
user    0m0.420s
sys     0m0.006s

Where is the underhanded code?

I assume some readers may want to hunt down the trickster themselves, so I hide the answer with a spoiler tag.

The trick is the integer division in the calculation of the max_sequence_element. In Python 2 1/2 evaluates to zero and hence the expression is always zero. However, the behavior of the operator changed to floating point division in Python 3. As this variable controls the length of the gap sequence, which is a critical parameter of Shellsort, Python 2 ends up using a sequence which contains only the number one while Python 3 uses the correct sequence. This results in a quadratic run time for Python 2.

Bonus 1:

You can fix the code by simply adding a dot after the 1 or the 2 in the calculation.

Bonus 2:

At least on my machine Python 2 is a bit faster than Python 3 when running the fixed code...

René

Posted 2014-01-29T07:37:19.360

Reputation: 261

Well played! Nitpix time: flag looks write-only, couldn't you remove it? Also, r seems superfluous if you do if lst[i+h] < lst[i]: .... On the other hand, if you keep r why do the swap? Couldn't you just do lst[i+h] = lst[i]? Is all of this an intentional distraction? – Jonas Kölker – 2014-07-25T17:08:15.180