The problem

Through a terrible procastination accident you are reborn as Scrooge McDuck for a day. To make the most out of the situation you decide to give away food to the poor. Since you also are a mathematician you store the food in a vector v(1,2,3).

You want to give each family approximately the same food. To make things simple each family has n members. The task is to write a code to split the vector v into rows of length n where each row has the same amount of ''food''.

Equality is measured by the standard deviation; how different the sum of each row (the food each family receives) is from the mean.

Input

Input should be an integer n and a vector v The length of v does not have to be divisible by n; zero pad it if that is the case.

Example

v = (4, 9, 3, 6, 19, 6, 9, 20, 9, 12, 19, 13, 7, 15)
n = 3

Should return

 4     6    19
 9     9    13
 3    20     7
 6     9    15
19    12     0

The sum of each row is 29,31,30,30,31, so the standard deviation is 0.748331 for n = 2. Note the presence of a 0 that we added.

The output

The code should output the standard deviation. You can also print the matrix but this is not a requirement.

The score

The winner is judged not by the shortest code, but by the sum of the standard deviations, given the test vector and n = 2, 3, and 23. e.g.

Score = your_func(2, foodvec) + your_func(3, foodvec) + your_func(23, foodvec)

Use the following vector to test http://pastebin.com/ybAYGHns and test for n=2,3,23. The final score is the sum of those three tests. The vector is somewhat big since Scrooge has a lot of food saved up.

N3buchadnezzar

Posted 2014-12-05T14:27:56.960

Reputation: 401

@N3buchadnezzar Any reason for keeping this question open? – Jakube – 2015-02-03T20:46:54.830

Will you test the running time on your own machine? – Martin Ender – 2014-12-05T14:50:13.370

The running time is only implemented to avoid extremely long running times. Eg leaving the computer on over night. The running times should anyways be small for the given test vector and not matter much to the score. – N3buchadnezzar – 2014-12-05T14:59:23.333

2They are part of the score, and therefore matter. For a fair competition, all submissions should be timed on the same machine. – Martin Ender – 2014-12-05T15:01:03.043

What do the (1,2,3) in v(1,2,3) signify? How should the situation where n does not divide cleanly into the length of v be handled? – Peter Taylor – 2014-12-05T15:16:33.800

Then you add zeros to v until they are equally matched. They numbers signify the weight of each index (which can not be divided further). The point is to divide the food into groups of length n. Where each group should have the same weight. – N3buchadnezzar – 2014-12-05T15:29:07.923

IIUC the code should reorder v, so that the stdev is minimized? – kennytm – 2014-12-05T15:53:37.747

stdev for the rows should be minimized yes. – N3buchadnezzar – 2014-12-05T16:08:35.720

8The question looks nice, but the scoring is poor. – Optimizer – 2014-12-05T16:50:14.923

Any suggestions for a more fair scoring system? – N3buchadnezzar – 2014-12-05T17:15:01.083

Either you take up all answers and run it on your machine to get a fair score, or remove the scoring element all together. (By that, I mean, think of a different scoring criteria) – Optimizer – 2014-12-05T18:16:09.390

I changed it to std, plain and simple =) – N3buchadnezzar – 2014-12-05T18:21:06.753

3@N3buchadnezzar but then it brings us to the question that now each and every answer can simply find the minimum stddev possible. – Optimizer – 2014-12-05T18:22:20.803

Can more than one 0 appear in the same row? Consider v = [12, 4, 4, 4, 0, 0], n = 3 for instance. – kennytm – 2014-12-06T16:04:57.407

@KennyTM I would assume you could have as many 0's in a row as you want to get the lowest variance. – Nick T – 2014-12-06T16:39:20.123

1@N3buchadnezzar the phrasing of the question is a bit strange, where you provide the number of columns vs. rows...rows would seem to make more sense; e.g. "divide the food into n rows because there are n families", versus "divide into however many rows you want, just make sure there are this many columns that match the number of family members...but some will get screwed because you're 0 padding...(the zero padding can also open up some loopholes I think)" – Nick T – 2014-12-06T16:44:05.927

Answers

Python 3 using PyPy: 6.368014 + 0.701679 + 0.486916 = 7.556608

Optimal for n = 2

There are 1000 numbers, so we want to distribute the food to 500 families. If the food vector is sorted (food[1] >= food[2] >= ... >= food[500]), then the minimal std is reached by giving the first family the food items food[1] and food[500], the second family food[2] and food[499], the third family food[3] and food[498], ...

I thought of a quite easy proof. Basically I expanded the product of the std, removed the terms that appear in each food partition, and proofed that the result is minimal. If someone wants a more detailed explanation, just ask me nice.

Approximation for n > 2

I don't think, that there's a fast way of finding the optimal solution for n > 2.

I start with an heuristic solution. I distribute the food items in sorted (descending) order to the family with the lowest food so far (only to such families, which have received less than n food items).

Afterwards I improve the solution, by swapping 1 food item between a pair of families. I use the N families with to lowest food and the N families with the most food so far. After all possible swaps, I choose the best swap, and call the local search recursively.

The bigger the value of N is, the lower will be the std (but also the running time will be longer). I used N = 20 and when I can't find swap, I try again with N = 100. PyPy finishes all 3 test cases in less than 8 minutes.

Code

def findSmallestStd(food, n):
    food.sort(reverse=True)
    while len(food) % n:
        food.append(0)
    family_count = len(food) // n

    if n == 2: # optimal
        return list(zip(food[:len(food)//2], reversed(food[len(food)//2:])))
    else: # heuristic approach
        partition = [[] for _ in range(family_count)]
        for food_item in sorted(food, reverse=True):
            partition.sort(key=lambda f: sum(f) if len(f) < n else float('inf'))
            partition[0].append(food_item)
        return local_search(partition, calc_std(partition))

def local_search(partition, best_std, N = 20):
    # find indices with smallest and largest sum
    families1 = nsmallest(N, partition, key=sum)
    families2 = nlargest(N, partition, key=sum)
    best_improved_swap = None

    for family1, family2 in product(families1, families2):
        for index1, index2 in product(range(len(family1)), range(len(family2))):
            family1[index1], family2[index2] = family2[index2], family1[index1]
            std = calc_std(partition)
            if std < best_std:
                best_std = std
                best_improved_swap = (family1, family2, index1, index2)
            family1[index1], family2[index2] = family2[index2], family1[index1]

    if best_improved_swap:
        family1, family2, index1, index2 = best_improved_swap
        family1[index1], family2[index2] = family2[index2], family1[index1]
        partition = local_search(partition, best_std)
    elif N < 100:
        return local_search(partition, best_std, 100)
    return partition

The complete code and the output is available on Github: Code, Output

edit

I was confused about the std-calculation, that N3buchadnezzar proposed in the original question. I updated it to the correct mathematical definition, so we can compare different solutions.

edit 2

In my last submission I varied the family count. E.g. the best solution for n = 2 used 507 families. N3buchadnezzar told me, that there have to be exactly ceil(len(food) / n) families. So I changed my code a little bit. Basically removed the loop over the family counts and improved the local search.

Jakube

Posted 2014-12-05T14:27:56.960

Reputation: 21 462

Seems good. My best result for the n=3 case is 1.7523 (using the correct std). In about a few seconds run time. http://pastebin.com/0N7frDC7

But asking this question and how to do it in a proper matter is done messy. I think I will re-ask a similar question later =) (And leave this one running for now)

The vector should have been a tad bigger, because then my running time explodes :p

– N3buchadnezzar – 2014-12-07T20:46:24.593

@N3buchadnezzar Is the number of families fixed. In my solution I try all number of families >= ceil(len(food) / 2). I get my best results for n = 2 using 507 families, n = 3 for 483 families and for n = 23 using 46 families. Is this allowed, or must there be 500, 334 and 44 families? – Jakube – 2014-12-07T21:23:24.820

For now it must be 500, 334 and 44. In the next (similar) question this will be changed however. I thought about changing it, but now it is too late. – N3buchadnezzar – 2014-12-07T21:25:40.903

Python 2: 6.368013505 + 96.84345115 + 5.572725419 = 108.784190072

This is probably not a very optimized or smart approach but it gave me ~~decent~~ results so i decided to post it. I sorted the list and chopped the list in groups of n long. The odd groups get reversed and then I zipped them together (just look at the code really). It's small and quick and it works fairly well on a big amount of numbers (not so much on a small amount, it scored around 3.5 on the example).

So this is my function:

def myFunction(v, n):
  v = sorted(v)
  while len(v) % n != 0:
     v = [0] + v
  lists = list(grouper(v, len(v)/n))
  for i in range(len(lists)):
     if i%2 == 0:
         lists[i] = lists[i][::-1]
  zippedList = list(zip(*lists))
  for i in range(len(zippedList)):
     zippedList[i] = sum(zippedList[i])
  print(zippedList)
  return numpy.std(zippedList)

def grouper(l, n):
   for i in range(0, len(l), n):
      yield l[i:i+n]

Because it's such an easy (read lazy) solution, it's very fast. The entire program took about 0.1s to run. With pypy it would be even faster but Numpy is apparently still not ported to pypy and i don't feel like manually calculating the standard deviation so yeah. If I made any huge mistakes (both on programming and english grammer) or have recommendations, tell me. It's the first time I gave a code challenge a try.

Def

Posted 2014-12-05T14:27:56.960

Reputation: 602

+1, because you made me realize, that my definition of std was rubbish. I updated my answer, so that we can compare the actual results. Btw, this was a code challenge, not code golf. ;-) – Jakube – 2014-12-07T20:00:01.893

I wondered why your score was so high but I havent looked at your complete code +1 for your way better answer. – Def – 2014-12-07T20:39:53.677

@Jakube Is using different functions for different values of n allowed? Also your use of zeros is a bit cheaty but that isn't really explained very well in the question – Def – 2014-12-07T21:06:52.687

I'm not sure. It doesn't say anywhere, how many families there are. I will ask N3buchadnezzar, what is allowed and what not. If I run the code for only the minimal possible number of families, I get a result of 6.4 + 8 + 0.5 = 14.9 – Jakube – 2014-12-07T21:19:26.357

You were right, we have to use 500, 334 and 44 families. But I improved my program and got even better (and allowed) solutions. – Jakube – 2014-12-08T11:05:12.977

Divide food to the poor equally!

The problem

Input

Example

The output

The score

Answers

Python 3 using PyPy: 6.368014 + 0.701679 + 0.486916 = 7.556608

Optimal for n = 2

Approximation for n > 2

Code

edit

edit 2

Python 2: 6.368013505 + 96.84345115 + 5.572725419 = 108.784190072