7
The challenge is a follow-up to Filter a large file quickly which had two submissions where were as fast as wc! This time the difficulty is that RAM is severely restricted which means that new methods will have to be used.
- Input: Each line has three space separated positive integers.
Output: All input lines
A
B
,T
that satisfy either of the following criterion.- There exists another input line
C
,D
,U
whereD = A
and0 <= T - U < 100
. - There exists another input line
C
,D
,U
whereB = C
and0 <= U - T < 100
.
- There exists another input line
To make a test file use the following python script which will also be used for testing. It will make a 1.3G file. You can of course reduce nolines for testing.
import random
nolines = 50000000 # 50 million
for i in xrange(nolines):
print random.randint(0,nolines-1), random.randint(0,nolines-1), random.randint(0,nolines-1)
Rules. Your code must use no more than 200MB of RAM. I will enforce this using ulimit -v 200000
. Your code must therefore also run on ubuntu using only easily installable and free software.
Score. You can report your unofficial score by testing on your machine as (the time in seconds needed by Keith Randall's submission from Filter a large file quickly divided by the time in seconds taken by your submission )*100 .
Winning criterion. The winner will be the one with the highest score when I test the code on my computer using a file created by the sample python code. I will first run
sync && sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'
before each test run.
The deadline is one week from the time of the first correct entry.
My Machine The timings will be run on my machine. This is a standard 8GB RAM ubuntu install on an AMD FX-8350 Eight-Core Processor. This also means I need to be able to run your code.
Some relevant timing information
largefile.file is a file created using the sample python code.
sync && sudo bash -c 'echo 3 > /proc/sys/vm/drop_caches'
time wc largefile.file
real 0m26.835s
user 0m18.363s
sys 0m0.495s
time sort -n largefile.file > /dev/null
real 1m32.344s
user 2m9.530s
sys 0m6.543s
So many [fastest-code]s. You could change it to least memory used total in filtering the memory, including variables, code, etc. – Justin – 2014-05-14T20:57:35.217
@Quincunx Luckily I am not the only person on codegolf.se :) To be fair, fastest-code questions are a) really cool and b) still in the minority on codegolf.se. It is also important to notice that the winning language has been different each time for my questions and one question had 19 answers. Who would have thought nimrod would win for example? Least memory is an interesting idea but I think you will end up with something so slow you can't test it. Maybe next time. – None – 2014-05-14T21:02:47.533
@Quincunx I should also add that it's not trivial to write any code that works, let alone is fast, for this problem because of the RAM restriction. So I don't really look at it as a standard fastest code problem. – None – 2014-05-14T21:28:17.803
The ram restriction is fairly trivial to work with :) – NPSF3000 – 2014-05-15T04:43:23.473
Is there a restriction on disk space? – Οurous – 2014-05-15T05:00:24.663
1@Ourous Shhh! You're giving away ma secrets :P – NPSF3000 – 2014-05-15T05:29:47.733
2@Ourous No, although I won't test code that might damage my computer! If you are going to use a lot of disk space, please say how much in your answer. – None – 2014-05-15T07:01:56.733