5
1
Intro - Nearest neighbour comparison between two arrays is something that can be done very efficiently when only 1 variable per array is involved, because it can be done by sorting and then performing a rolling comparison. However, for a 2 variable comparison this sort operation cannot be done anymore and another smart way has to be thought of. Hence the challenge:
Challenge - The challenge is to write a code that can, for each row in array A, find a row in array B such that (A[i,1]-B[j,1])^2+(A[i,2]-B[j,2])^2 is minimal and that then stores the corresponding row number j. The perfect minimum (0) is a perfect match of A[i,1] with B[j,1] and of A[i,2] with B[j,2].
As a code example I have a very clunky loop written in R
N=nrow(SetA)
for (i in 1:N){
#calculate vector with distance of row i to the entire SetB
vec=with(SetB,(SetA[i,x]-x)^2+(SetA[i,y]-y)^2)
#calculate minimum of that vector and store the index
indexVec[i,ind:=which.min(vec)]
}
Some sidenotes - If it matters (not sure it will) assume that both arrays have 10k rows. Ties can be decided at 'random' e.g. determined by processing sequence or whatever way is more convenient in your code.
Example in/out - For clarity, here is the sample input
SetA=[1 3;2 4;3 5;4 6;5 7]
SetB=[2 2.5; 4 3.1; 2 2.0; 3 3.0; 6 5.0;]
and the expected sample output
indexVec=[1;4;4;5;5]
where the values in indexVec indicate where the nearest neighbour of row i in SetA is located in SetB.
Fastest code (as tested on my i5-4210U) wins. The testset will consist of 2 arrays of 10k rows and 2 columns filled with floating point numbers between -1E10 and +1E10. Of course seeded with the same seed to remove any unfairness.
Feel free to use whatever language, but I am also very interested in an optimized version of the R code above.
Would you happen to have some sample input/output? I'm not quite sure I fully understand the problem – Sp3000 – 2015-02-12T07:22:33.747
@Sp3000 I have added the sample input and output (not in R syntax, but I guess you get what it means) – Michiel – 2015-02-12T07:31:08.940
@Michiel So basicly the arrays are 2 dimensional arrays? And the sum of both elements in a row of array A should be closest to the sum of both elements in a row of array B right? – Teun Pronk – 2015-02-12T08:20:38.363
@TeunPronk I have edited the question a bit to hopefully clarify this issue. Arrays are indeed 2 dimensional, but rather than finding the closest in sum of both elements I want the closest in distance as defined by the squared difference – Michiel – 2015-02-12T08:41:31.470
1So the input is two arrays of 2D points using a floating point type, and the task is to find, for each 2D point in the second array, the one in the first array which is nearest to it according to the L2 norm? How should ties be broken? – Peter Taylor – 2015-02-12T09:48:53.850
@PeterTaylor I didn't realize before that (mathematically) the L2 norm is the same thing, but you are absolutely right. Ties are allowed to be broken at random, i.e. whatever the script does. – Michiel – 2015-02-12T10:27:59.573
What kind of data will you test the submissions on? Uniformly random floats from some interval? – Zgarb – 2015-02-12T10:32:14.303
@Zgarb 2D arrays of 10k rows filled with randomly generated floating point numbers between -1E10 and +1E10 – Michiel – 2015-02-12T10:34:29.660
If I posted a C code, how would you pass in the random array values? should I fill them myself? – rorlork – 2015-02-12T11:18:42.313
@rcrmn I will use the same set as for different languages, loaded from file – Michiel – 2015-02-12T11:23:48.787