Python 3 using PyPy: 6.368014 + 0.701679 + 0.486916 = 7.556608
Optimal for n = 2
There are 1000 numbers, so we want to distribute the food to 500 families. If the food vector is sorted (food[1] >= food[2] >= ... >= food[500]
), then the minimal std is reached by giving the first family the food items food[1]
and food[500]
, the second family food[2]
and food[499]
, the third family food[3]
and food[498]
, ...
I thought of a quite easy proof. Basically I expanded the product of the std, removed the terms that appear in each food partition, and proofed that the result is minimal. If someone wants a more detailed explanation, just ask me nice.
Approximation for n > 2
I don't think, that there's a fast way of finding the optimal solution for n > 2.
I start with an heuristic solution. I distribute the food items in sorted (descending) order to the family with the lowest food so far (only to such families, which have received less than n food items).
Afterwards I improve the solution, by swapping 1 food item between a pair of families.
I use the N
families with to lowest food and the N
families with the most food so far. After all possible swaps, I choose the best swap, and call the local search recursively.
The bigger the value of N
is, the lower will be the std (but also the running time will be longer). I used N = 20
and when I can't find swap, I try again with N = 100
. PyPy finishes all 3 test cases in less than 8 minutes.
Code
def findSmallestStd(food, n):
food.sort(reverse=True)
while len(food) % n:
food.append(0)
family_count = len(food) // n
if n == 2: # optimal
return list(zip(food[:len(food)//2], reversed(food[len(food)//2:])))
else: # heuristic approach
partition = [[] for _ in range(family_count)]
for food_item in sorted(food, reverse=True):
partition.sort(key=lambda f: sum(f) if len(f) < n else float('inf'))
partition[0].append(food_item)
return local_search(partition, calc_std(partition))
def local_search(partition, best_std, N = 20):
# find indices with smallest and largest sum
families1 = nsmallest(N, partition, key=sum)
families2 = nlargest(N, partition, key=sum)
best_improved_swap = None
for family1, family2 in product(families1, families2):
for index1, index2 in product(range(len(family1)), range(len(family2))):
family1[index1], family2[index2] = family2[index2], family1[index1]
std = calc_std(partition)
if std < best_std:
best_std = std
best_improved_swap = (family1, family2, index1, index2)
family1[index1], family2[index2] = family2[index2], family1[index1]
if best_improved_swap:
family1, family2, index1, index2 = best_improved_swap
family1[index1], family2[index2] = family2[index2], family1[index1]
partition = local_search(partition, best_std)
elif N < 100:
return local_search(partition, best_std, 100)
return partition
The complete code and the output is available on Github: Code, Output
edit
I was confused about the std-calculation, that N3buchadnezzar proposed in the original question. I updated it to the correct mathematical definition, so we can compare different solutions.
edit 2
In my last submission I varied the family count. E.g. the best solution for n = 2 used 507 families. N3buchadnezzar told me, that there have to be exactly ceil(len(food) / n)
families. So I changed my code a little bit. Basically removed the loop over the family counts and improved the local search.
@N3buchadnezzar Any reason for keeping this question open? – Jakube – 2015-02-03T20:46:54.830
Will you test the running time on your own machine? – Martin Ender – 2014-12-05T14:50:13.370
The running time is only implemented to avoid extremely long running times. Eg leaving the computer on over night. The running times should anyways be small for the given test vector and not matter much to the score. – N3buchadnezzar – 2014-12-05T14:59:23.333
2They are part of the score, and therefore matter. For a fair competition, all submissions should be timed on the same machine. – Martin Ender – 2014-12-05T15:01:03.043
What do the
(1,2,3)
inv(1,2,3)
signify? How should the situation wheren
does not divide cleanly into the length ofv
be handled? – Peter Taylor – 2014-12-05T15:16:33.800Then you add zeros to
v
until they are equally matched. They numbers signify the weight of each index (which can not be divided further). The point is to divide the food into groups of length n. Where each group should have the same weight. – N3buchadnezzar – 2014-12-05T15:29:07.923IIUC the code should reorder
v
, so that the stdev is minimized? – kennytm – 2014-12-05T15:53:37.747stdev for the rows should be minimized yes. – N3buchadnezzar – 2014-12-05T16:08:35.720
8The question looks nice, but the scoring is poor. – Optimizer – 2014-12-05T16:50:14.923
Any suggestions for a more fair scoring system? – N3buchadnezzar – 2014-12-05T17:15:01.083
Either you take up all answers and run it on your machine to get a fair score, or remove the scoring element all together. (By that, I mean, think of a different scoring criteria) – Optimizer – 2014-12-05T18:16:09.390
I changed it to std, plain and simple =) – N3buchadnezzar – 2014-12-05T18:21:06.753
3@N3buchadnezzar but then it brings us to the question that now each and every answer can simply find the minimum stddev possible. – Optimizer – 2014-12-05T18:22:20.803
Can more than one 0 appear in the same row? Consider
v = [12, 4, 4, 4, 0, 0]
,n = 3
for instance. – kennytm – 2014-12-06T16:04:57.407@KennyTM I would assume you could have as many 0's in a row as you want to get the lowest variance. – Nick T – 2014-12-06T16:39:20.123
1@N3buchadnezzar the phrasing of the question is a bit strange, where you provide the number of columns vs. rows...rows would seem to make more sense; e.g. "divide the food into n rows because there are n families", versus "divide into however many rows you want, just make sure there are this many columns that match the number of family members...but some will get screwed because you're 0 padding...(the zero padding can also open up some loopholes I think)" – Nick T – 2014-12-06T16:44:05.927