16
Write a GOLF assembly program that given a 64-bit unsigned integer in register n
puts a non-zero value into register s
if n
is a square, otherwise 0
into s
.
Your GOLF binary (after assembling) must fit in 4096 bytes.
Your program will be scored using the following Python3 program (which must be put inside the GOLF directory):
import random, sys, assemble, golf, decimal
def is_square(n):
nd = decimal.Decimal(n)
with decimal.localcontext() as ctx:
ctx.prec = n.bit_length() + 1
i = int(nd.sqrt())
return i*i == n
with open(sys.argv[1]) as in_file:
binary, debug = assemble.assemble(in_file)
score = 0
random.seed(0)
for i in range(1000):
cpu = golf.GolfCPU(binary)
if random.randrange(16) == 0: n = random.randrange(2**32)**2
else: n = random.randrange(2**64)
cpu.regs["n"] = n
cpu.run()
if bool(cpu.regs["s"]) != is_square(n):
raise RuntimeError("Incorrect result for: {}".format(n))
score += cpu.cycle_count
print("Score so far ({}/1000): {}".format(i+1, score))
print("Score: ", score)
Make sure to update GOLF to the latest version with git pull
. Run the score program using python3 score.py your_source.golf
.
It uses a static seed to generate a set of numbers of which roughly 1/16 is square. Optimization towards this set of numbers is against the spirit of the question, I may change the seed at any point. Your program must work for any non-negative 64-bit input number, not just these.
Lowest score wins.
Because GOLF is very new I'll include some pointers here. You should read the GOLF specification with all instructions and cycle costs. In the Github repository example programs can be found.
For manual testing, compile your program to a binary by running python3 assemble.py your_source.golf
. Then run your program using python3 golf.py -p s your_source.bin n=42
, this should start the program with n
set to 42, and prints register s
and the cycle count after exiting. See all the values of the register contents at program exit with the -d
flag - use --help
to see all flags.
I unrolled a 32-iteration loop to save ~64 operations per test. That's probably outside the spirit of the challenge. Maybe this would work better as speed divided by codesize? – Sparr – 2015-04-28T06:19:28.050
@Sparr Loop unrolling is allowed, as long as your binary fits in 4096 bytes. Do you feel this limit is too high? I'm willing to lower it. – orlp – 2015-04-28T06:20:05.303
@Sparr Your binary right now is 1.3k, but I think that 32-iteration loop unrolling is a bit much. How does a binary limit of 1024 bytes sound? – orlp – 2015-04-28T06:24:58.443
Warning to all contestants! Update your GOLF interpreter with
git pull
. I found a bug in the leftshift operand where it didn't properly wrap. – orlp – 2015-04-28T07:07:59.427I'm not sure. 1024 would only require me to loop once; I'd still save ~62 ops per test by the unrolling.
I suspect someone might also put that much space to good use as a lookup table. I've seen some algorithms that want 2-8k of lookup tables for 32 bit square roots. – Sparr – 2015-04-28T15:26:41.083
all the existing solutions, including mine before unrolling, fit in much much less space than that. How big is yours? – Sparr – 2015-04-28T15:28:25.117
Is there a way to get access to
math
(in particular,math.sqrt
) to programmatically generate a lookup table in the data section? Right now I have a string of3k
octal escapes pasted in my file... – 2012rcampion – 2015-07-01T02:34:08.973@2012rcampion No, but you can use
x ** 0.5
to do square roots. – orlp – 2015-07-01T07:04:55.697