52
23
This is a follow up to How slow is Python really? (Or how fast is your language?).
It turns out it was a bit too easy to get a x100 speedup for my last question. For those who have enjoyed the challenge but want something harder where they can really use their low level skills, here is part II. The challenge is to get a x100 speedup for the following python code as tested on my computer.
To make it more difficult I am using pypy this time. The current timing for me is 1 minute and 7 seconds using pypy 2.2.1.
Rules
- The first person to submit code which I can run, is correct and is x100 times faster on my machine will be awarded a bounty of 50 points.
- I will award the win to the fastest code after a week.
import itertools
import operator
import random
n = 8
m = 8
iters = 1000
# creates an array of 0s with length m
# [0, 0, 0, 0, 0, 0, 0, 0]
leadingzerocounts = [0]*m
# itertools.product creates an array of all possible combinations of the
# args passed to it.
#
# Ex:
# itertools.product("ABCD", "xy") --> Ax Ay Bx By Cx Cy Dx Dy
# itertools.product("AB", repeat=5) --> [
# ('A', 'A', 'A', 'A', 'A'),
# ('A', 'A', 'A', 'A', 'B'),
# ('A', 'A', 'A', 'B', 'A'),
# ('A', 'A', 'A', 'B', 'B'),
# etc.
# ]
for S in itertools.product([-1,1], repeat = n+m-1):
for i in xrange(iters):
F = [random.choice([-1,0,0,1]) for j in xrange(n)]
# if the array is made up of only zeros keep recreating it until
# there is at least one nonzero value.
while not any(F):
F = [random.choice([-1,0,0,1]) for j in xrange(n)]
j = 0
while (j < m and sum(map(operator.mul, F, S[j:j+n])) == 0):
leadingzerocounts[j] +=1
j += 1
print leadingzerocounts
The output should be similar to
[6335185, 2526840, 1041967, 439735, 193391, 87083, 40635, 19694]
You must use a random seed in your code and any random number generator that is good enough to give answers close to the above will be accepted.
My Machine The timings will be run on my machine. This is a standard ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.
Explanation of code
This code iterates over all arrays S of length n+m-1 that are made up for -1s and 1s. For each array S it samples 1000 non-zero random arrays F of length n made up of -1,0 or 1 with probability of 1/4, 1/2,/14 of taking each values. It then computes the inner products between F and each window of S of length n until it finds a non-zero inner product. It adds 1 to leadingzerocounts
at each position it found a zero inner product.
Status
Perl. 2.7 times slowdown by @tobyink. (Compared to pypy not cpython.)
J. 39 times speedup by @Eelvex.
- C. 59 times speed up by @ace.
- Julia. 197 times faster not including start up time by @one-more-minute. 8.5 times speed up including start up time (it's faster using 4 processors in this case than 8).
- Fortran. 438 times speed up by @semi-extrinsic.
- Rpython. 258 times speed up by @primo.
- C++. 508 times speed up by @ilmale.
(I stopped timing the new improvements because they are too fast and iters was too small.)
It was pointed out that timings below a second are unreliable and also some languages have a start-up cost. The argument is that if you are to include that you should also include the compilation time of C/C++ etc. Here are the timings for the fastest code with the number of iterations increased to 100,000.
- Julia. 42 seconds by @one-more-minute.
- C++. 14 seconds by @GuySirton.
- Fortran. 14s by @semi-extrinsic.
- C++. 12s by @ilmale.
- Rpython. 18s by @primo.
- C++. 5s by @Stefan.
The winner is.. Stefan!
Follow-up challenge posted. How high can you go? (A coding+algorithms challenge) . This one is harder.
3an explanation of what the code is suppose to achieve would be nice, so we can rewrite it and not simply port it – Einacio – 2014-04-28T16:55:43.733
@Einacio Does the explanation I just added help? – None – 2014-04-28T17:12:13.600
Are you sure that the
while
loop is right? I'm not proficient in python, but it looks to me that the code would not incrementi
if the inner product is non-zero. – Kyle Kanos – 2014-04-28T17:14:09.427@KyleKanos That's right. It stops at the first non-zero inner product. Oh I should change the variable name! – None – 2014-04-28T17:18:55.807
6"The first person to submit code which I can run, is correct and is x100 times faster on my machine wins immediately and the competition closes." What is the purpose of closing the competition like that? Why not use a date deadline like most others, so we can see it furthered reduced in other languages? – grovesNL – 2014-04-28T17:44:17.743
@grovesNL It was just to try to make it more interesting! I also think it's going to be very hard so there is a good chance no one will get over that particular hurdle. – None – 2014-04-28T17:49:16.317
@Lembik to make it interesting you could set up a bounty to be granted at the end of the first week, but still change the accepted answers after that if it is beaten – Einacio – 2014-04-28T17:50:14.977
5@Einacio That is a nice idea. I changed the rules which I hope no one will mind. – None – 2014-04-28T17:50:41.060
Perhaps I'm misunderstanding what is supposed to happen when it finds a non-zero inner product, but why are the values in the array decreasing? Shouldn't each value in
leadingzerocounts
be similar? – grovesNL – 2014-04-28T18:20:31.973@grovesNL leadingzerocounts counts the number of times you can get that far before seeing a non-zero inner product basically. So you expect it to be less likely to get further. – None – 2014-04-28T18:25:51.230
Come to think of it... I think there might be significant mileage in noticing that it is very wasteful to produce all these arrays of length n+m-1 when most of the time the right hand ends of them are never used. – None – 2014-04-28T18:40:18.457
Are "slight" algorithmic changes allowed? – Eelvex – 2014-04-29T06:42:19.680
@Eelvex If it's computing exactly the same thing mathematically you can change the algorithm any way you like. – None – 2014-04-29T07:57:23.400
>
I think the OP should clarify two things:
I will also add these two comments from Nick Maclaren at the Cambridge Centre for Scientific Computing:
"Nobody should EVER use 32-bit random number generators for more than about ten million numbers in a row without careful analysis" "No Xorshift generator is good enough to use on its own, because they have some evil properties." – semi-extrinsic – 2014-04-30T09:21:36.280
@semi-extrinsic I think it is too late for me to change the rules wrt the RNG. I will compare the outputs for different values of n to that the python code gives. If it is very close I will accept the code. The timings should include everything for JIT languages. I will post another competition in a couple of days where the timings won't be <1 second which will help Julia people. – None – 2014-04-30T09:37:13.230
Yes, timings of below 1 second is another issue, they are usually not very reliable. Random numbers are hard to get right though, it's a common problem in computational physics that people use RNGs with poor properties. The "trick" of using one 32 bit random int to give many random bits is also very much frowned upon. – semi-extrinsic – 2014-04-30T09:59:47.923
@semi-extrinsic I can assure you my next puzzle will not be solved in <1 second :) – None – 2014-04-30T10:42:43.557
@Lembik Startup time and performance are separate issues, and should be considered separately. I encourage you to either reconsider excluding boot time, or rename this question to "How quickly does your language boot up?". For the faster solutions that's all you're measuring. – one-more-minute – 2014-04-30T12:28:36.130
@one-more-minute There is no boot-up time with compiled languages. The performance difference between the C++ solution and the one I just posted in Fortran is not due to start-up time (zero in both cases) but because in C++ he can easily compute just one random integer and convert this to 15 instances of (-1, 0 or +1). I have to create 15 random integers. I tried doing it his way, but couldn't find a solution with low enough overhead :( – semi-extrinsic – 2014-04-30T13:00:49.047
@semi-extrinsic Exactly, so including boot time means restricting the competition to precompiled languages for no real reason. I'm not saying startup time is never important, but in this case it's obscuring what's actually interesting – run time performance and techniques to improve it. – one-more-minute – 2014-04-30T13:23:36.970
Why do you declare
firstzero
andbothzero
? They don't seem to be used – mdesantis – 2014-04-30T13:39:09.427@one-more-minute I completely agree that a problem should be set where even the fastest solutions take more than a minute. Even C++ and Fortran timings are starting to be a little dubious at below 1 second. – semi-extrinsic – 2014-04-30T13:41:26.197
@Lembik perhaps faster solutions should use 10,000 iterations, so that times are back in the minute range? Then you'd just divide them by 10. – one-more-minute – 2014-04-30T13:45:44.980
@Lembik I gave up, I was trying this in Go but the missing bits and pieces were just too many and things became confusing when I tried to implement them myself:S. Having done work in python NLTK I honestly miss the existing tools in the language :( – ymg – 2014-04-30T23:32:12.550
@one-more-minute Basically having timings under a second makes everything unsatisfactory. This is something I have learned. You can't just repeat 10,000 times as a smart compiler will just eliminate those repeats. You have to actually output something different in each iteration and that needs a carefully thought through spec. – None – 2014-05-01T06:47:50.203
@Lembik I'm not sure I understand – if you set
iters=10,000
in your original code, that will fairly straightforwardly make it run 10x longer, no? There's no compiler that could or would optimise those loops away. – one-more-minute – 2014-05-01T09:23:58.970Oh I misunderstood! Yes that would work. – None – 2014-05-01T09:57:47.877
@Lembik Would you mind timing mine again? Make sure you're compiling with optimizations on. There was also an additional flag needed to get the threads going on Linux (and check to see your 8 cores aren't busy)... – Guy Sirton – 2014-05-01T21:35:36.703
@GuySirton Done. I must be losing track of which code has which compiler flags. Thanks. – None – 2014-05-02T06:07:43.457
1@Lembik I've improved my Fortran version, making it 2x faster again on my machine. Could you time it again? :) – semi-extrinsic – 2014-05-02T12:16:48.747
1@semi-extrinsic Done. – None – 2014-05-02T19:20:15.183
If you are counting C / C++ compilation times, you should also count Java compilation times. Also this is about the fastest code, not about the fastest compiler... – Jeroen – 2014-05-06T18:23:35.723
1@JeroenBollen Actually I didn't count either but you would be right had I done so. – None – 2014-05-06T18:40:15.380
It has been very interesting to follow. It is a pity that this coding challenge included random numbers (PRNGs) and moreover without any rules about their quality. This is more a library issue than of programming languages and this has made the results much less comparable and interesting. – Philm – 2014-06-22T20:17:14.047