7
1
This is an optimization/algorithms challenge. Given two sets of points X and Y, which are to be thought of as points on a circle of circumference T, the challenge is to find the rotation which minimizes the distance from the set X to the set Y. Here the distance is just the sum of the absolute distance from each point in X to its closest point in Y. Here closest is really just the minimum distance you need to travel along the circle to reach a point in Y (remembering that you can go clockwise or anti-clockwise). Note that this measure of distance is not symmetric.
In order to be able to create the test data, you will need to be able to run Python, but your answer can be in any language of your choosing that is available without cost to run on Linux. You can also use any libraries you choose as long as they are easy to install on Linux and available without cost.
The test data will be created using the following Python code. It outputs two lines to standard out. The first is circle X and the second is circle Y.
import numpy as np
#Set T to 1000, 10000, 100000, 1000000, 10000000, 1000000000
T = 1000
N = int(T/5)
#Set the seed to make data reproducible
np.random.seed(seed = 10)
timesY = np.sort(np.random.uniform(low = 0.0, high = T, size = N))
indices = np.sort(np.random.randint(0,N, N/10))
rotation = np.random.randint(0,T, 1)[0]
timesX = [(t + rotation + np.random.random()) % T for t in timesY[indices]]
for tX in timesX:
print tX,
print
for tY in timesY:
print tY,
For those who don't run python, I have also uploaded some test files. Each file has three lines. The first line is the points for circle X, the second is the points for circle Y, and the last is the score you get from the generic optimizer in the python code below. This score will be in no way optimal and you should do much better.
http://wikisend.com/download/359660/1000.txt.lzma
http://wikisend.com/download/707350/10000.txt.lzma
http://wikisend.com/download/582950/100000.txt.lzma
http://wikisend.com/download/654002/1000000.txt.lzma
http://wikisend.com/download/416516/10000000.txt.lzma
Here is also a toy worked example of the distance between two sets of point to make the distance function clear.
Let timesX = [2, 4, 98]
and timesY = [1, 7, 9, 99]
. Let us assume the circumference of the circle is 100
. As it stands the distance is (2-1)+(7-4)+(99-98) = 5
. Now if we rotate timesX
by 3
we get [5, 7, 1]
which has total absolute distance 2
to timesY
. In this case this appears to be the optimal solution.
Here is some sample code which assumes that the circles are called timesX
and timesY
, and outputs the minimum distance it finds. It doesn't, however, do a very good job, as you will see.
from bisect import bisect_left
from scipy.optimize import minimize_scalar
def takeClosest(myList, myNumber, T):
"""
Assumes myList is sorted. Returns closest value to myNumber in a circle of circumference T.
"""
pos = bisect_left(myList, myNumber)
if (pos == 0 and myList[pos] != myNumber):
before = myList[pos - 1] - T
after = myList[0]
elif (pos == len(myList)):
before = myList[pos-1]
after = myList[0] + T
else:
before = myList[pos - 1]
after = myList[pos]
if after - myNumber < myNumber - before:
return after
else:
return before
def circle_dist(timesX, timesY):
dist = 0
for t in timesX:
closest_number = takeClosest(timesY, t, T)
dist += np.abs(closest_number - t)
return dist
def dist(x):
timesX_rotated = [(tX+x) % T for tX in timesX]
return circle_dist(timesX_rotated, timesY)
#Now perform the optimization
res = minimize_scalar(dist, bounds=(0,T), method='bounded', options={ 'disp':True})
print res.fun
Scoring
You should run your code for T = 1000, 10000, 100000, 1000000, 10000000. Your code should output both the optimum rotation and the distance you get for it so that it can be checked. The scoring criterion is as follows.
- You should calculate the sum of the distances you find for the five values of T. Low is better and the best possible is 0. You aren't allowed to cheat by looking at what rotation was used to create the data in the first place.
- Then you should calculate
551431.326508571/(the sum you just calculated)
and report that as your score.
Edit 1. Changed score so that high is better and removed time restriction so you can just run it on your own computer.
Edit 2 Note that the function dist
can be used as a verifier. Just put the points for circle X in timesX
and the points for circle Y in timesY
and set x
to be the rotation you want to test. It will return a score which you can compare to what you get.
Edit 3 Fixed the data creation code which wasn't exactly the code I had used to create the data.
Can we assume that the size of X and Y is equal? And is the minimum distance for T = 10, X={1,1,1,2} ,Y = {1,2,2,2} zero? Do we have to calculate the distance from every x to its closest y AND the distance for every x to its closest x? By 'absolute distance' do you mean the euclidean distance or just the arclength between both points? – flawr – 2014-09-03T18:50:52.350
@flawr You can assume that T (the circumference) is the same for X and Y. They might however have a very different number of points as they in fact do in my test cases. The distance for T = 10, X={1,1,1,2} ,Y = {1,2,2,2} is zero but actually I assume that no point occurs more than once . The distance is the sum of the distance of every point in X to its closest point in Y. You never need to compare points in X to points in X. 'absolute distance' is just the arclength between both points. – None – 2014-09-03T19:05:53.953
Thank you for the clarifiaction: @Comparing only every point X to the set Y: So the minimal distance for X = {1,2} Y = {1,2,3} will be zero, but the minimal distance for X = {1,2,3} Y = {1,2} will be positive (not zero)? – flawr – 2014-09-03T19:11:21.973
@flawr Yes that is exactly right. In fact the distance is 1 in your example (asuming T = 10 for example). – None – 2014-09-03T19:16:45.993
I have currently only looked at 1000.txt and 10000.txt, and I am slightly confused by the format. Why are there two newlines in each? Why is there only one number after the second newline? – justinpc – 2014-09-04T11:59:46.283
@jpcooper There should be three lines. The first line in 1000.txt starts "509.843975277 520.825970885 .." and is the location of the points in circle X. The second line starts "3.94826632791 14.6348751032 .." and is the location of the points in circle Y. The third line has one number which is the minimum distance between circle X and circle Y that my generic optimizer code found. In this case it is 33.0305238422 . Does that help? – None – 2014-09-04T12:38:37.430
You're saying we can't cheat by using the rotation created to use the data. But can we use knowledge of the data generation algorithm? – Martin Ender – 2014-09-05T18:36:53.363
@MartinBüttner To be honest I hadn't considered that this might help but I don't see why not. Obviously you can't use knowledge of the seed. – None – 2014-09-05T21:07:46.793
Why don't we just generate two list of random numbers from 0 to 2*PI ? Your data generation method seems way too complex and it won't make the problem simpler or harder. – Ray – 2014-09-06T17:14:53.003
One more thing...We should have the number
T
in the input data(maybe the first line) or the solver program won't know it. – Ray – 2014-09-06T17:28:05.243@Ray Yes T should have been in the input data. Good point. The point of the data generation procedure is to make sure there is a rotation which has small distance and that that distance is very different from the typical distance. – None – 2014-09-06T17:33:53.423
Is the optimum rotation always an integer? – Tobia – 2014-09-06T20:29:34.450
@Tobia No, not necessarily. – None – 2014-09-06T20:30:16.390