No branching please

14

2

Anyone who is moderately into low level code optimization know about the perils of branching, be it implemented as if-statements, loops or select-statements the possibility of a branch misprediction is a terrible clock-wasting thing.

Simple problems can be solved much better with simple arithmetic, so let's do that.

For the following problems all variables are 32 bit unsigned integers and the only allowed code is plain set statements involving only the following operators:

+ addition
- subtraction
* multiplication
/ integer division, rounds down, division by 0 not allowed
% modulo
& binary and
| binary or
^ binary exclusive or
>> bitshift right
<< bitshift left

Logic operators, return 1 if the expression is true and 0 if it is false.
== equal
!= not equal
< less than
<= less than or equal
> greater than
>= greater than or equal

Set operator
=

Every line must consist of a variable identifier followed by a set operator, followed by an expression.

An expression may not contain additional set operators, but may contain variable identifiers, literal numbers and parentheses.

The golfing score shall only count the number of operators.

Example:

myvar = ( ( ( foo + 5 ) * bar ) % 7 ) == 3

Has a score of 5 operators.

A solution may include as many variables as the author see fit.
Variables that have not been set have value 0.
Overflow and underflow is allowed, all negative numbers underflow, so 3 - 5 is 4294967294, even as part of a larger statement.

Task 1: Max

Two values, A and B, exist in the scope, make the RESULT variable contain the largest of those values when the program terminates.

Task 2: Median

Three values, A, B and C, exist in the scope, make the RESULT variable contain the median of those values when the program terminates.

Task 3: Square root

One value, A, exist in the scope, make the RESULT variable contain the square root of A, rounded down, when the program terminates.

It is ok to post an answer to only one or two of the questions, to some of you just finding valid solutions will be a challenge.

aaaaaaaaaaaa

Posted 2013-09-11T18:36:05.143

Reputation: 4 365

Where are unary operators? I don't care about - but ~ could be nice (even if I don't know what for). – John Dvorak – 2013-09-11T20:59:33.350

Sure, 0xFFFF_FFFF_FFFF_FFFF ^ x and 0 - x. How could I have forgot? – John Dvorak – 2013-09-11T21:01:23.513

@JanDvorak It made the shortest description, for completeness logic not ! is also pretty trivial: x == 0. – aaaaaaaaaaaa – 2013-09-11T21:15:32.967

What is the behavior of division by zero? – John Dvorak – 2013-09-11T21:17:00.800

In Mathematica (a>b) returns True or False. Boole converts False to 0 and True to 1. Is it legal to use Boole[a-b]? – DavidC – 2013-09-11T22:48:00.573

@DavidCarraher the task and the operations defined are language-independent. How you implement them is up to you. – John Dvorak – 2013-09-11T22:50:26.840

@JanDvorak Since division by zero on an X86 CPU cause the program to halt I'll have to go with division by zero being illegal. – aaaaaaaaaaaa – 2013-09-12T17:32:39.197

Answers

5

Task 3, 23 ops

x = (A >> 16) + A / ((A >> 13) + 511) + 15
x = (x + A/x) >> 1
x = (x + A/x) >> 1
x = (x + A/x) >> 1
RESULT = x - (x > A/x)

Using Newton's method, as the other solutions do, with a more tactfully chosen seed. The first bit A >> 16 keeps the top of the range happy, the second bit A / ((A >> 13) + 511) keeps the middle of the range happy, and the last bit 15 the bottom, and also prevents division by zero errors (15 is the largest value possible which allows 0 to converge correctly - halved three times minus correction equals zero). For the input values 225, 275625, 82137969, 2908768489 (and nearby values) the initial seed is exact. All edge cases (perfect squares, perfect squares + 1, and perfect squares - 1) on the range 0 .. 2**32-1 have been tested and are correct.

A few comments about the rules:
Overflow and underflow is allowed, all negative numbers underflow, so 3 - 5 is 4294967294, even as part of a larger statement.

That last bit turns out to be something of an innovation killer. I initially attempted a solution using a generalized form of Halley's method, but realized that it was invalid given the above restriction. The iteration (as applied to square roots) is this:

x = x * (3*A + x*x) / (A + 3*x*x)

This iteration has nice qualities that Newton's does not. It converges cubically (rather than quadratically), it converges from above or below (rather than only from above), and it is not as sensitive to a poorly chosen seed (if Newton's iteration is provided a seed which is far too low, it will greatly over-shoot the convergence point, and then need to work its way back down).

Newton's method also has the problem (at least when dealing with integers) that it will quite often reach an x such that A / x - x = 2 - in this case, it will converge to a value one larger than the proper integer root, which needs to be corrected for; Halley's method needs no such correction. But unfortunately, the value of 3*A + x*x will quite often be larger than the allowed 32-bit integer space.

There are a number of other generalized nth root algorithms, but they all share this same characteristic:

x = x + x*(v - x**n)/(v*n)
x = (x*(n+1) - x**(n+1)/v)/n
x = ((n-2)*x + (4*v*x)/(v + x**n))/n
x = x*((n+2)*v + (n-2)*x**n)/(v + x**n)/n
x = ((n-2)*x + (n*x*v)/(v + (n-1)*x**n))/(n-1)
x = ((n-2)*x + x*((n*2-1)*v + x**n)/(v + (n*2-1)*x**n))/(n-1)

x = x + 2*x*(v - x**n)/(v + x**n)/n
x = x + x*31*(v - x**n)/(10*v + 21*x**n)/n
x = x + x*561*(v - x**n)/(181*v + 380*x**n)/n
x = x + x*1153*(v - x**n)/(372*v + 781*x**n)/n

etc. Most of these display either cubic or quadratic convergence. The last four are part of a series of iterations which converges on quartic convergence. But in praxis, Newton's method will get you what you need with fewer operations, unless you need to calculate many hundreds of digits.

primo

Posted 2013-09-11T18:36:05.143

Reputation: 30 891

Pretty nice, but fails for 4294967295. As for the rules, they gotta be tight to make it interesting. You can argue what exact premises make the best challenge, but ultimately it is far more important that the rules are clear and unambiguous, than what exactly they allow. – aaaaaaaaaaaa – 2013-09-13T14:15:15.637

I don't think Halley would have been worth it anyway, from a far off guess it will improve by a little less than a factor of 3, Newton does a little less than a factor of 2. Similarly from a good guess Halley will triple the accuracy, Newton will double it. So one Halley iteration is worth pretty exactly log(3)/log(2) ~= 1.585 Newton iterations. – aaaaaaaaaaaa – 2013-09-13T14:25:27.950

@eBusiness I initially had 2 Halley's with a similarly chosen seed totaling 25 ops - with error when A = 0 - so this is in fact shorter. About 4294967295, that was an oversight: as 65536² ≡ 0, the correction iteration fails to correct. I'll see if I can find an alternative. – primo – 2013-09-13T15:58:08.540

@eBusiness fixed. – primo – 2013-09-13T16:09:57.130

Sleekest square root of the pack, nice job, and an official victory badge. – aaaaaaaaaaaa – 2013-09-20T23:18:34.543

5

Task 3, 39 Operations

EDIT: Changed last line; see comments.

This is an implementation of the Newthon method. Tested with all positive squares, and also with the positive squares minus one, and also one million random numbers in the range from 0 to 2^32-1. The seemingly funny starting value is short for (1022 + A/1022) / 2, which needs the least number of iterations (I think), and also makes the RESULT for A=0 right (which would not be the case for 1024 instead of 1022).

r = (511 + A/2044)
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
r = (r + A/r) / 2
RESULT = r - (r > A/r)

Reinstate Monica

Posted 2013-09-11T18:36:05.143

Reputation: 929

Should I keep my inferior copy of newton's method that was optimised in parallel with yours and posted a decent amount of time later? Great minds think alike and having the solution split across two two answers is bad, but that's what the current state of affairs is, as you haven't answered #2. – John Dvorak – 2013-09-11T23:57:12.940

@JanDvorak: Thanks for asking. It's OK if you put my slightly shorter method into your answer. Also, thanks for giving credit to me :-) – Reinstate Monica – 2013-09-12T00:16:29.297

Really nice try, but fails for input 4294965360 through 4294967295. – aaaaaaaaaaaa – 2013-09-12T17:20:01.253

@eBusiness: What result do you get for those inputs? I get 65535 in my tests, which is OK. – Reinstate Monica – 2013-09-13T17:47:55.323

I get 65536. Maybe you don't use the prescribed integer format. – aaaaaaaaaaaa – 2013-09-13T18:25:41.653

@eBusiness: Ah, thanks for the hint. r*r overflows in those cases. (r > A / r) should work. – Reinstate Monica – 2013-09-13T19:12:34.047

5

65(61) operations (5 + 13 + 47(43))

Task 1 -- Max(A,B)

RESULT = A + (B - A) * (A <= B)

This is the obvious solution. You need the assignment, you need comparison, you need to multiply the comparison with something, the multiplicand cannot be one of the variables and the product cannot be the result.

Task 2 -- Mid(A,B,C)

RESULT = A                               \
       + (B - A) * (A > B) ^ (B <= C)    \
       + (C - A) * (A > C) ^ (C <  B)

This is an improvement over my previous 15-op solution, which conditioned all three variables - this saved two subtractions, but it introduced another centrality test. The test itself is simple: an element is in the middle iff exactly one other of the two is above.

Task 3 -- sqrt(A)

X1     = 1024 + A / 2048
X2     = (X1  + A / X1 ) / 2
...
X10    = (X9 + A / X9 ) / 2
RESULT = X16 - (X16 * X16 > A)

Eleven rounds of newton approximation. The magic constant of 1024 is already beaten by WolframW (and 512 causes division by zero for a=0 before a=2**32 converges), but if we can define 0/0 as zero, ten iterations will work with the starting value of 512. I do admit that my claim of ten iterations is not entirely clean, but I still claim them in parentheses. I'll have to investigate if nine is possible, however. WolframH's solution is nine iterations.

John Dvorak

Posted 2013-09-11T18:36:05.143

Reputation: 9 048

I think the first line of Task 3 is not right: the second constant should be 4 times the first constant (to have "pure" Newton). – Reinstate Monica – 2013-09-12T00:17:47.613

@WolframH A better initial guess might explain why I'm wasting cycles. Where did you come up with 4*? This looks like two iterations rolled into one. – John Dvorak – 2013-09-12T00:21:41.873

(1024 + A/1024)/2 == (512 + A/2048) (which is like X0 = 1024, and then starting Newton). – Reinstate Monica – 2013-09-12T00:27:09.997

Nice solution to Task 1. Columbus' egg. – DavidC – 2013-09-12T00:29:39.503

@DavidCarraher of course, the correct solution would be MOV RESULT, A; CMP A,B; CMOVA RESULT, B ;-)

– John Dvorak – 2013-09-12T00:41:10.957

In most languages * takes precedence over ^, a few more brackets in your task 2 solution wouldn't hurt. And that square root task just keeps on giving, your 43 operations version fails for A=0, and the 47 operations version fails for A=4294967295. I guess that is as short as task 1 and 2 will get, so good job on those. – aaaaaaaaaaaa – 2013-09-12T18:03:18.653

5

1: 5 operators

RESULT = B ^ (A ^ B)*(A > B)

2: 13 operators

RESULT = B ^ (A ^ B)*(A > B) ^ (A ^ C)*(A > C) ^ (B ^ C)*(B > C)

3: 27 operators

g = 47|((0x00ffffff & A)>>10)|(A>>14)
r = (g + A/g)/3
r = (r + A/r)>>1
r = (r + A/r)>>1
r = (r + A/r)>>1
RESULT = r - (r*r-1>=A)

aaaaaaaaaaaa

Posted 2013-09-11T18:36:05.143

Reputation: 4 365