Mr. Moneybag's Money Piles

5

Mr. Moneybags is a very rich man. He has an assistant Mr. Perfect, who one day, got very mad, and shredded some of Mr. Moneybag’s $100 bills! Mr. Moneybags decided to punish Mr. Perfect.

Mr. Moneybags tells Mr. Perfect to stack his money into different piles.

He starts with N (1 <= N <= 1,000,000, N odd) empty stacks, numbered 1..N. Mr. Moneybags then gives him a sequence of K instructions (1 <= K <= 25,000), each of the form "A B", meaning that Mr. Perfect should add one new bill to the top of each stack in the range A to B, inclusive. For example, if Mr. Perfect is told "11 13", then he should add a bill to each of the stacks 11, 12, and 13.

After Mr. Perfect finishes stacking the money according to his instructions, Mr. Moneybags would like to know the median height of his N stacks -- that is, the height of the middle stack if the stacks were to be arranged in sorted order (N is odd, so this stack is unique). Please write an efficient program to help Mr. Perfect determine the answer to Mr. Moneybag’s question.

INPUT FORMAT:

  • Line 1: Two space-separated integers, N K.
  • Lines 2..1+K: Each line contains one of Mr. Moneybag’s instructions in the form of two space-separated integers A B (1 <= A <= B <= N).

SAMPLE INPUT:

7 4                                                                             
5 5                                                                             
2 4
4 6
3 5

INPUT DETAILS:

There are N=7 stacks, and Mr. Moneybags issues K=4 instructions. The first instruction is to add a bill to stack 5, the second is to add a bill to each stack from 2..4, etc.

OUTPUT FORMAT:

  • Line 1: The median height of a stack after Mr. Perfect completes the instructions.

SAMPLE OUTPUT:

1

The winner is the one with the "most efficient" algorithm. So, if someone posts an O(N) solution, then it will beat any O(N^2). If there is a tie, aka, two or more people who post an O(N) solution, for example, then whoever has is the shortest code (code-golfed) wins.

So, your purpose is to write a efficient algorithm, and to prevent possible ties with other people, try to code-golf it as well.

Another thing, the input is to be read from a file and the output must be written to a file.

EDIT Also, the solution must be returned under or equal to 2 seconds. If it is not, you will receive a T. A wrong answer is an X. Which means, if no one can get it under time, then the most efficient algorithm sigh score of T can win.

EDIT #2 Here are some more samples:

  1. http://pastebin.com/x88xAyPA
  2. http://pastebin.com/y6Cuxm7W
  3. http://pastebin.com/kgbeiQtd

EDIT #3 Bonus: (subtract 100 chars from total)

Solve this under 2 seconds :)

  1. http://pastebin.com/3uT9VSzv

mmk

Posted 2014-07-25T00:07:40.903

Reputation: 253

If n is guaranteed to be odd, I think you mean n<1,000,000; not n<=1,000,000. – SuperJedi224 – 2015-06-05T16:37:48.743

2If one submission is in O(N) and one is in O(K), which one wins? Of course, that's just a special case - how are more complicated combinations of N and K ordered? – Martin Ender – 2014-07-25T07:21:24.057

Also you might want to remove the the constraints on N and K. They don't really seem relevant if there is no hard time limit and scoring is by big-O notation anyway. And if you keep them, you might end up having to discussing that an answer claims it runs in O(1), because N and K themselves are O(1). ;) – Martin Ender – 2014-07-25T10:08:01.877

2That doesn't answer my question. You could have a submission that loops over N once and K twice (nested loops, of course) and one that loops over K once and N twice. Now the first submission will run in O(N K^2), the second submission will run in O(K N^2). Which one is better? What if one submission has O(K^4 N) and another has O(N^4)? How do you compare these? – Martin Ender – 2014-07-25T11:15:14.527

@MartinBüttner, I would compare these by selecting large numbers of N and K and putting them into the big O notation of the algorithm. So something that is N^2 K is worst as a N and K approach the upper bounds, compared to K^2 N, and so on. – mmk – 2014-07-25T11:45:19.380

@MartinBüttner I guess that one of these numbers shall be limited. If you state the rule "there can't be more than 100 stacks" then N becomes a constant (complexity-wise, i'm sure you understand what I mean). – Fabinout – 2014-07-25T11:46:20.497

@MartinBüttner I have added a time constraint, which will weed out inefficient solutions, as well – mmk – 2014-07-25T11:54:45.417

@Fabinout currently both numbers are limited. (see my second comment). @ mmk: Okay, so here is a suggestion to specify this. Big-O notations are ordered as they would be if both N and K were replaced by a single variable M (this places all 5th order mixed polynomials worse than 4th order mixed polynomials). For ties, we look at the first N/K discrepancy from most to least significant factors, and the algorithm that has a K there is better than the one that has an N. – Martin Ender – 2014-07-25T11:54:45.627

@mmk What purpose does that serve in addition to the complexity? That's a bit like asking a code golf question, and saying "Your submission must not be longer than 1000 characters", which to me sounds like "You can't even participate if you're not good enough at this." – Martin Ender – 2014-07-25T11:56:04.177

@MartinBüttner, I have added that the time constraint is not 100% mandatory. Solutions can get a score of T which means over time. If NO solution can solve it under the time limit, then the most efficient T scored algorithm will win – mmk – 2014-07-25T13:07:26.877

Answers

3

O(K log N), many characters of Python

Edit: Improved by replacing merge sort with radix sort.

This is an improvement over my previous answer, but I'm posting it separately because it's a different strategy, and so the post doesn't get super-long.

This is now optimal because simply reading the input can take 2*K*lg N time in the worst case -- 2*K numbers each with lg N bits.

The goal is to have minimal dependence on N. To do this, rather than store counts for each pile index, we realize that all piles in a contiguous cluster unaffected by the endpoints of any interval must have the same counts. So, if we see that an interval starts with 5, and there's no changes until 15, then those 10 piles have the same value.

Each change in pile size from a pile to the next one is caused by an interval starting or ending. We call these "events". Each event increases or decreases the current pile size by 1. If, from one event to the next, there's some number of piles, each of those piles has the current size.

We extract the 2*k events from the intervals, and sort them using a radix sort, taking advantage of the values being 1 through N. This take takes O(K*log(N)) because the numbers have O(lg N) bits.

We then run over the events, and for each one, update the list of counts to the previous event, which requires adding a number up to N (so lg N bits). This takes O(log(N)) per step, and so O(K*log(N))

Finally, having gotten the counts of the pile sizes, it's easy to quickly find the median by computing the cumulative number of piles up to a given size (or, rather, we subtract it from our desired count). This also takes O(K*log(N)).

The algorithm does the huge test case in under a second.

The code (ungolfed) is below with comments.

# Process input
def get_ints(line):
 vals = line.split()
 return (int(vals[0]),int(vals[1]))

s = input()

lines = s.split("\n")
N,K = get_ints (lines[0])
intervals = [get_ints(line) for line in lines[1:]]

mins,maxes=map(list,zip(*intervals))

#Each event will represent an increase or decrease of a pile from the previous one
#Each min increases the count by 1
#Each max causes the count to decrease by 1, starting with the next pile
events = [(a,1) for a in mins] + [(b+1,-1) for b in maxes]
events.append((N+1,0))

#Do a radix sort on the first element of each pair 
#Each number has O(lg N) digits, so this takes O(k lg N) 
events.sort()

#Count of the number of piles of each size
#pile_sizes[size] counts the number of piles of that size
pile_sizes = [0]*K
cur_pile = 1
cur_size = 0

#For each event, we track the pile size after the event
#Then, on the next event, we know there were some number of piles of that size
#The number of piles equals the distance between events
#Multiple events on the same pile have zero distance and don't affect the counts
for event in events:
 event_pile = event[0]
 piles_current_size = event_pile - cur_pile
 pile_sizes[cur_size] += piles_current_size
 cur_pile = event_pile
 cur_size += event[1]

#Get the i_th largest pile size from a list of pile size counts
def get_order_stat(counts,i):
 for val in range(len(counts)):
  cur_count = counts[val]
  if i<cur_count:
   return val
  i-=cur_count
 return None

median = get_order_stat(pile_sizes,N//2)

print(median)

xnor

Posted 2014-07-25T00:07:40.903

Reputation: 115 687

Well done on getting the optimal solution! However, python's .sort() is not a radix sort, it's a merge sort, so that still needs to be added. – isaacg – 2014-07-26T03:15:59.560

@isaacg Yeah, I'm missing the actual radix sort code, but I'm not going to code it up. Unfortunately, all the snippets I see online have worse running times because they manipulate numbers rather than bits, and I don't feel like implementing something careful enough. If anyone wants to write a shorter program, they can consider this one to have arbitrarily high length. – xnor – 2014-07-26T13:49:42.487

2

O((N + K) * log(N)), many chars of Python

(See my improved solution.)

Edit: I realized that I wasn't accounting for operations with numbers taking log(N) time. For example, I think the linear time median, despite its name, is actually N log N once you take into account the time to compare numbers bit by bit.

Edit: Improved sorting. Thanks to isaacg.

The idea is to make a list of the size of each pile 1...N, and then take the median of this list. The key is to obtain the size of a pile from the size of the pile before by adding the number of intervals that start at that value, and subtracting the number that end at the previous value. By sorting the lists of interval-starts and interval-ends, we can quickly query how many intervals start and end.

See the code for more detailed comments. I didn't golf it.

The code has no problem with the huge test case, taking <1 second on my machine.

# Process input
def get_ints(line):
 vals = line.split()
 return (int(vals[0]),int(vals[1]))

s=input()

lines = s.split("\n")
N,K = get_ints (lines[0])
intervals = [get_ints(line) for line in lines[1:]]

# Separate out the starts and ends of the intervals. 
# Sort each, which takes O((K + N) * log(N)) with a counting sort.
mins,maxes=map(list,zip(*intervals))
mins.sort()
maxes.sort()

#Make a running list of the sizes of each pile
size_list = []
cur_size = 0
for i in range(1,N+1):
 #Update the size of the next pile by adding one for each interval that starts here
 #Each number in min can only can one update, so this contributes O(k) total
 while mins and mins[0]==i:
  mins.pop(0)
  cur_size+=1
 #Likewise, subtract one for each interval that ended at the previous number
 while maxes and maxes[0]==i-1:
  maxes.pop(0)
  cur_size-=1
 size_list.append(cur_size)

#Now, we compute the median of size_list
#The naive median algorithm sorts and take the middle element, taking O(N*log(N)) time
#But, there's a linear-time algorithm to select the median (or any order statistic) of    a list
linear_time_median = lambda L: select(L, 0, len(L), len(L)//2)

#I didn't want to write my own linear median algorithm, as it standard but long. 
#So, I use the linear median code from https://gist.github.com/boyers/7233877
median = linear_time_median(size_list)

print(median)

Here's the code for linear-time-median from https://gist.github.com/boyers/7233877.

#### Code credit: https://gist.github.com/boyers/7233877

# partition A[p:r] in place about x, and return the final position of the pivot
def partition(A, p, r, x):
  # find the index of the pivot
  i = -1
  for j in range(p, r):
    if A[j] == x:
      i = j
      break

  # move the pivot to the end
  if i != -1:
    t = A[r - 1]
    A[r - 1] = A[i]
    A[i] = t

  # keep track of the end of the "less than" sublist
  store_index = p

  # iterate
  for j in range(p, r - 1):
    if A[j] < x:
      t = A[j]
      A[j] = A[store_index]
      A[store_index] = t
      store_index += 1

  # put the pivot in its final place
  if i != -1:
    t = A[r - 1]
    A[r - 1] = A[store_index]
    A[store_index] = t

  # return the store index
  return store_index

# find the ith biggest element in A[p:r]
def select(A, p, r, i):
  # make a copy of the array
  A = A[:]

  # divide the n elements of A into n / 5 groups
  groups = [[]] * (((r - p) + 4) // 5)
  for x in range(p, r):
    groups[(x - p) // 5] = groups[(x - p) // 5] + [A[x]]

  # find the median of each group
  medians = [sorted(l)[(len(l) - 1) // 2] for l in groups]

  # find the median of medians
  if len(medians) == 1:
    median_to_rule_them_all = medians[0]
  else:
    median_to_rule_them_all = select(medians, 0, len(medians), (len(medians) - 1) // 2)

  # partition A around the median of medians
  partition_index = partition(A, p, r, median_to_rule_them_all)

  # base case
  if p + i < partition_index:
    return select(A, p, partition_index, i)
  if p + i > partition_index:
    return select(A, partition_index + 1, r, p + i - partition_index - 1)
  return A[p + i]

xnor

Posted 2014-07-25T00:07:40.903

Reputation: 115 687

see my answer. It takes yours but, makes it more efficient if I used linear-median algorithm, since, I don't sort the array in the beginning – mmk – 2014-07-26T02:20:17.963

Is yours O(K) for fixed K? See also my new answer which improves the dependence on N. Maybe we can combine the improvements. – xnor – 2014-07-26T02:22:33.867

yes it is O(K), when disregarding the sort then find middle, since, I can just use the linear-median algorithm (which I didn't know at the time)... – mmk – 2014-07-26T02:26:38.980

@xnor Don't use a generic algorithm to sort, use a counting sort. Every number in mins and maxes is between 0 and N, so a counting sort is guaranteed to be O(N+K). I think that makes the whole algorithm O(N+K). – isaacg – 2014-07-26T02:48:13.053

@isaacg Thanks, that cuts the running time. I think I'm still using O(N log N) for the linear time select though. Despite it's name and unlike what algorithm classes like to assume, it takes log N to compare two values. My new answer is O(log N) in N -- I can't improve the sort there, can I? – xnor – 2014-07-26T02:55:03.217

What I meant is this one can be O((N+K)*log N). I'll take a look at the other one too. – isaacg – 2014-07-26T02:55:52.727

Wait, do I understand right that radix sort can do O(K * lg N)? If so, I can improve the other answer. – xnor – 2014-07-26T03:00:54.867

1

C#

307 273 281 283 characters. A complexity of O(K) I'm not sure but could be O(K*N).

void m() 
{ 
    var r = new StreamReader("in.txt"); 
    var m = new int[int.Parse(r.ReadLine().Split(' ')[0]) + 1]; 
    string l; 
    while ((l = r.ReadLine()) != null) 
    { 
        var t = l.Split(' '); 
        for (var i = int.Parse(t[0]); i <= int.Parse(t[1]); i++) 
        { 
            m[i]++; 
        } 
    } 
    Array.Sort(m); 
    File.WriteAllText("out.txt", m[m.Length / 2].ToString()); 
}

The character count is after deleting all new lines/unnecessary whitespaces...

PS: Your input example has an even amount of stacks, which violates your rule.

PPS: I answered the wrong question in my first try.

Second version little bit bigger but faster. If compiled without debug symbols (ctrl + f5 in visual studio 2013) 2.5 seconds for the last example.

But 318 characters...

void m()
{
    var r=new StreamReader("in.txt");
    var m=new int[int.Parse(r.ReadLine().Split(' ')[0])];
    string l;
    while((l=r.ReadLine())!=null)
    {
        var t = l.Split(' ');
        var e=Array.ConvertAll(t,int.Parse);
        m=m.Select((c,i)=>i>=e[0]&i<=e[1]?++m[i]:m[i]).ToArray();
    }
    Array.Sort(m);
    File.WriteAllText("out.txt",m[m.Length/2].ToString());
}

Armin

Posted 2014-07-25T00:07:40.903

Reputation: 181

You can make your code shorter by removing unnecessary whitespace (for example, you can remove two spaces at a = 0) and by using one-letter variable names: u instead of t1, r instead of reader and l instead of line. – ProgramFOX – 2014-07-25T13:31:49.640

@ProgramFOX renamed the variables and the method. – Armin – 2014-07-25T13:38:45.600

@Armin, my bad. I will change it – mmk – 2014-07-25T14:15:17.663

If I read this correctly this gets the height of the median-numbered stack, not the height of the median-heighted stack. This is not necessarily the same: Take 3 2; 1 1; 3 3. Two stacks have a bill, so the correct answer is 1 even though the 2 stack has height 0. – histocrat – 2014-07-25T15:00:50.707

@histocrat oh yeah, i've read the question wrong... – Armin – 2014-07-25T15:21:52.340

@histocrat new answer, I hope this one does the right thing... – Armin – 2014-07-25T15:52:42.590

@Armin, the stacks aren't zero-based like arrays. So, if I put a bill on stack #x (which is the last stack), your code will throw index out of bounds... – mmk – 2014-07-25T18:47:20.270

@mmk this should do the trick xD – Armin – 2014-07-25T18:50:11.110

+1 This solves all the example correctly. The last one was slow, however, the others were correct and under time... – mmk – 2014-07-25T19:06:42.273

@mmk what time did it take on your pc for the last example? – Armin – 2014-07-25T19:08:12.633

@Armin, It took 4.956 seconds, the others took less than 1 second. – mmk – 2014-07-25T19:19:36.083

1

C# - 374 Characters

Shamelessly lifted from Armin, and probably possible to micro-optimize even further. However, I saw his post and thought "Low hanging fruit" so hey, may as well.

void Main(){var r=new StreamReader("i");var m=new int[f(r.ReadLine())[0]];string l;while((l=r.ReadLine())!=null){var t=f(l);for(var i=t[0];i<=t[1];i++){m[i]++;}}Array.Sort(m);File.WriteAllText("o",m[m.Length/2].ToString());}static int[]f(string n){int[]b=new int[2];int r=0;for(int i=0;i<n.Length;i++){char c=n[i];if(c!=32){r=10*r+(c-48);}else{b[0]=r;r=0;}}b[1]=r;return b;}

Ungolfed

void Main()
{
    var r = new StreamReader("i"); 
    var m = new int[f(r.ReadLine())[0]]; 
    string l; 
    while ((l = r.ReadLine()) != null) 
    { 
        var t = f(l);
        for (var i = t[0]; i <= t[1]; i++) 
        { 
            m[i]++; 
        } 
    } 
    Array.Sort(m);
    File.WriteAllText("o", m[m.Length/2].ToString());
}
static int[] f(string n)
{
    int[] b = new int[2];
    int r = 0;
    for (int i = 0; i < n.Length; i++)
    {
        char c = n[i];
        if (c != 32)
        {
            r = 10 * r + (c - 48);
        }
        else
        {
            b[0] = r;
            r=0;
        }
    }
    b[1] = r;
    return b;
}

Using LinqPad and the third pastebin link from the op, running a GC/warmup, and taking the average of 10 runs I get the following comparison times:

  • Armin first answer: 3229ms average
  • Armin second answer: 2601ms average
  • My answer: 18ms average

First submission so I can't wait to see how badly the more seasoned posters can beat this!

Also, now that the bonus was added I can test mine against that! However, sadly over an average of 10 runs on my computer it takes 4293ms to come up with 9314. So I have quite a ways to go yet.

user29815

Posted 2014-07-25T00:07:40.903

Reputation:

if the first code snippet is your golfed one, then I have to say that there are still a lot of unnecessary whitespaces. and you can drop the private. and please, do a benchmark with the real code... – Armin – 2014-07-25T23:37:59.903

Whoops, turns out I was in too much of a hurry to get this posted before I had to step away from the computer or a little while. Thanks for pointing it out! I really hope I got them all this time. I'm still new to this. Also, since you asked, I re-ran my tests using the exact code from the posts, rather than my test code which was using Console.WriteLine, and it didn't actually make a difference in the numbers I had original posted. I updated the numbers from a second (and third) test anyways, just cause I had them. – None – 2014-07-26T00:08:25.153

0

Lua, 247 bytes

The program constructs a frequency table and finds the median from that. I don't know what that would be in the complexity notation. For the sake of code-golfing, the output is appended to the end of the input file. If that's a problem, I'll change it.

s="*n"f=io.open(--[[FILE NAME--]],"a+")t,i=f:read(s,s)b={}c={}d={}for x=1,i do
y,z=f:read(s,s)for w=y,z do
q=b[w]or 0
if c[q]then
c[q]=c[q]-1
end
b[w]=q+1
c[q+1]=1+(c[q+1]or 0)end
end
n=0
for _,i in pairs(c)do
for w=1,i do
d[n]=_
n=n+1
end
end
f:write(d[#d/2])f:flush()

Ungolfed

s="*n"
f=io.open("input.txt")
t,i=f:read(s,s)
b={}
c={}
d={}
for x=1,i do
  y,z=f:read(s,s)
  for w=y,z do
    q=b[w]or 0
    if c[q]then
      c[q]=c[q]-1
    end
    b[w]=q+1
    c[q+1]=1+(c[q+1]or 0)
  end
end
n=0
for _,i in pairs(c)do
  for w=1,i do
    d[n]=_
    n=n+1
  end
end
f:write(d[#d/2])
f:flush()

Nexus

Posted 2014-07-25T00:07:40.903

Reputation: 641