Fastest C/C++ comparison function for an opaque field containing doubles

Description:

Write the fastest comparison function for an opaque field containing a double. Here is my example:

// Three-way comparison function:
//   if a < b: negative result
//   if a > b: positive result
//   else: zero result
inline int Compare(const unsigned char* a, const unsigned char* b) const 
{
    if (*(double*)a < *(double*)b) return -1;
    if (*(double*)a > *(double*)b) return +1;
    return 0;
}

Size: 188 characters (excluding the comments)
Speed: N/A

Background:

Google's LevelDB stores keys as opaque fields (i.e. bytes) and it requires the use of a comparator when the keys cannot be compared as bytes, such as the case with doubles. You can read this article to see why Floating-Point binary comparison is tricky, so when LevelDB stores key-value pairs on disk it has to properly compare the opaque keys in order to achieve good data locality.

Correct Submission:

The submission must be written in C/C++.
The submission must correctly compare the opaque fields as doubles.
Given that 1 and 2 are satisfied, the winner should be selected based on the minimum CPU ~~operations~~ ticks (i.e. the fastest comparison).

If there is a tie, then the shortest correct submission will be awarded.

P.S. This is my first time posting, so please let me know if I should correct/improve something in the competition.

Kiril

Posted 2011-11-11T20:43:44.517

Reputation: 131

What architecture are you targeting? Do you only care about x86, or is this challenge supposed to be something that will compile well across multiple architectures? Which compiler and options (i.e. x86 with just SSE2, with SSE4, with AVX? Does it have to be efficient for 32-bit code, is doing the comparison using 64-bit integers good (after type punning the double bit pattern to an int64_t, not just casting). Is branching vs. branchless important? – Peter Cordes – 2016-12-18T00:53:14.720

What is Compare going to inline into? Will it branch on the result, or will it store it somewhere? There's just nowhere near enough detail here for a micro-optimization question. – Peter Cordes – 2016-12-18T00:55:21.330

This is so old... I can't even remember what I needed it for. – Kiril – 2016-12-24T12:27:21.010

"Minimum CPU operations" isn't well defined unless you specify at minimum an architecture and a compiler, and even then sometimes using fewer operations isn't better. Moreover, with the exception of signed zeros and NaNs comparing doubles as bytes is really easy. – Peter Taylor – 2011-11-11T21:20:11.093

@PeterTaylor, I should probably change it to CPU ticks... additionally, we could run the code in codepad.org or ideone.com; however, I suspect that the performance may differ depending on the server load at the time. And I'm not exactly sure what's the best way to performance test (i.e. found out the number of CPU cycles).

– Kiril – 2011-11-11T21:24:30.947

@Lirik What you could do is giving a large array of test data and a sorting algorithm. Then you write a test-script where the sorting algorithm's object file is linked with the object files of each candidate code and you run a sort operation on the array. If the array is sorted correctly, the code is right. The time needed for sort is the time that counts. – FUZxxl – 2011-11-12T14:19:13.880

@FUZxxl, I'll try to come up with a good test data set for performance testing and I appreciate your submission! :) thanks! – Kiril – 2011-11-12T19:01:03.027

I don't get it. As Peter says, IEEE-754 is designed such that an integer comparison of the underlying bytes is the same as comparing the represented floating point numbers (modulo the various NaN values coming "after" infinity), nothing tricky about it. – ephemient – 2011-11-13T06:12:27.690

@ephemient, unless I'm misunderstanding how doubles work, double floating-point numbers have a leading bit for the sign. So negative double value, when stored as an opaque (binary) field, will start with 1 and if it's used in something like LevelDB, then I'm not sure how LevelDB will know that the leading 1 is a sign or if it's a really large 64-bit integer? – Kiril – 2011-11-14T05:36:04.010

@Lirik: you would need to sort the 64-bit integers while treating them as signed. – Keith Randall – 2011-11-17T03:03:53.827

@KeithRandall got it... so casting the double to a signed 64-bit integer would work for sorting purposes. – Kiril – 2011-11-17T03:06:42.213

But I don't like C/C++! (I love C++, and I like C, though.) – Mateen Ulhaq – 2012-01-05T08:09:04.617

2[tag:code-golf], [tag:code-challenge], and [tag:fastest-code] all together don't make sense. – Mateen Ulhaq – 2012-01-05T08:09:40.060

Answers

This function is written in MMIX assembly. It should perform the calculation in 2υ + 5μ. If you inline it, the call is no longer needed so you can save the POP instruction. This function should be compatible with gcc for MMIX. (Read TAOCP vol 1. fasc 1 for more information):

compareDoubles IS @  ; Declare symbol, @ is the current position
   LDOU $0,$0,0      ; Load first double  (LDOU: load octa unsigned), 1v
   LDOU $1,$1,0      ; Load second double, 1
   FCMP $0,$0,$1     ; Compare floating point values, 4µ
   POP  1,0          ; return, 1µ

If you assemble it, the code is 16 bytes long and may looks this:

8f 00 00 00  8f 01 01 00  01 00 00 01  f8 01 00 00

FUZxxl

Posted 2011-11-11T20:43:44.517

Reputation: 9 656

Technically the “v”s in this post should each be “υ” (U+03C5 GREEK SMALL LETTER UPSILON) (see footnote on page 22 of Fasc 1)… though in the font being used here as I type, it doesn't look as much like “v” as in the font used in the book. – ShreevatsaR – 2019-01-03T13:39:13.777

@ShreevatsaR Satisfied? – FUZxxl – 2019-01-04T13:10:00.697

-1, fails spec: not written in C or C++. Still pretty neat, though. – Ilmari Karonen – 2012-01-05T15:35:24.863

@Ilmari It's unusual to downvote a submission just because the language is wrong. – FUZxxl – 2012-01-05T21:54:00.640

Yeah, I realized that the downvote was unreasonable after I made it, but it was too late to revoke it. I'd mistakenly thought this was a new challenge, and wondered why people were rushing to upvote an obviously non-compliant answer. FWIW, I did go and find another one of your answers that I liked and upvoted it to make up for the hasty downvote. – Ilmari Karonen – 2012-01-05T22:06:06.273

Can't you load that into a memory location, then cast it as a function pointer and throw stuff at it? Kind of like how you'd run shell code in C? – Mr. Llama – 2012-01-12T22:03:46.383

@GigaWatt Good idea. I just have to be careful about GCC's own interpretation of the calling convention. – FUZxxl – 2012-01-12T22:32:31.373

#define compare(X, Y) ((X)<(Y)?-1:((X)>(Y)?1:0)) 
// 48 characters

or want to pass a pointer you can define

#define compare(X, Y) (*((double*)X)<*((double*)Y)?-1:(*((double*)X)>*((double*)Y)?1:0))
// 88 characters

Ali1S232

Posted 2011-11-11T20:43:44.517

Reputation: 417

Doesn't follow specs. – Thomas Eding – 2012-01-11T07:06:00.070

@trinithis why? defines just force the compiler to use inline calling, and removes type checking (and both happen in question example) – Ali1S232 – 2012-01-11T09:15:56.007

#define does not remove type checking. It's just a macro. In any case, your code will compare char*s as chars, not as doubles. – Thomas Eding – 2012-01-11T18:19:28.417

@trinithis fixed that, but anyway it's the fastest possible. – Ali1S232 – 2012-01-11T18:55:44.333