Binning in time

12

0

The task in this challenge is to put elements of an array into time bins. The input will be a non-decreasing array of positive integers representing the time of events, and an integer which represents the size of each bin. Let us start with an example. We call the input array A and the output array O.

`A = [1,1,1,2,7,10]` and `bin_size = 2`.

`O = [4,0,0,1,1]`.

Why? With a bin_size = 2, we'll have the following intervals: (0,2], (2,4], (4,6], (6,8], (8,10], where four items (1,1,1,2) are within the first interval (0,2], none in the second and third intervals, one 7 in the interval (6,8], and one 10 in the interval (8,10].

Your code should consider every interval of length bin_size starting from 0 and count how many numbers in A there are in each. You should always include the right hand end of an interval in a bin so in the example above 2 is included in the count of 4. Your code should run in linear time in the sum of the lengths of the input and output.

More examples:

`A = [1,2,7,12,15]`  and `bin_size = 5`.

`O = [2, 1, 2]`.

`A = [1,2,7,12,15]`  and `bin_size = 3`.

`O = [2,0,1,1,1]`.

You can assume that input and output can be given in any format you find convenient. You can use any languages and libraries you like.

user9206

Posted 2018-03-30T11:15:25.670

Reputation:

Are outputs with trailing 0s allowed? So returning [2,0,1,1,1,0] instead of [2,0,1,1,1]? – Kevin Cruijssen – 2018-03-30T14:10:00.983

No trailing zeros please. – None – 2018-03-30T14:11:00.260

2What about situations where max array value is not a multiple of bin_size, should we really handle these? It seems that most answers do, but if so, it would be nice to add a test case for this scenario to prevent confusion. – Kirill L. – 2018-03-31T08:40:43.780

@KirillL. Yes they should be handled too. – None – 2018-03-31T14:28:47.967

Can you please explain, in your first example, why is the value 2 not in the second interval (2,4] – GPS – 2018-04-01T14:36:54.703

@GPS The righthand end is included but not the left hand end. – None – 2018-04-01T15:04:48.690

So, (a,b] includes b but not a for all a<b. If input array has a 0, which bin does that get into, or in other words, is 0 a positive integer in context of this problem? – GPS – 2018-04-01T16:38:31.443

1@GPS 0 is not a positive integer. This isn’t an accident :) – None – 2018-04-01T19:25:04.690

Answers

9

R, 48 bytes

function(n,s)table(cut(n,0:ceiling(max(n)/s)*s))

Try it online!

Once again, table and cutting to a factor do the trick for the binning. Outputs a named vector where the names are the intervals, in interval notation, for instance, (0,5].

EDIT: Revert back to earlier version that works when s doesn't divide n.

Giuseppe

Posted 2018-03-30T11:15:25.670

Reputation: 21 077

I really don't R, but on TIO this appears to output a format you [most likely do not] find convenient without the table part. – my pronoun is monicareinstate – 2018-03-30T11:38:48.793

@someone that's exactly why it's there. cut splits the vector into factors with levels given by the intervals, and table counts the occurrences of each unique value in its input. – Giuseppe – 2018-03-30T11:41:20.437

I was suggesting to remove the table part to save 7 bytes and make your output format less good. According to the specs, you can output in any format you find convenient. – my pronoun is monicareinstate – 2018-03-30T11:43:31.160

1@someone ah, I see, I misunderstood your comment. No, I think that wouldn't be valid since we need the counts of each bin. – Giuseppe – 2018-03-30T12:03:29.953

1Not fully tested, but I think you can save a couple bytes reaplacing 0:ceiling(max(n)/s)*s with seq(0,max(n)+s-1,s). It works at least for the two samples in the question. – Gregor Thomas – 2018-03-30T14:24:33.290

1@Gregor Hmm if that does work 1:max(n/s+1)*s-s is another improvement since the two are equivalent – Giuseppe – 2018-03-30T14:37:01.400

6

Octave, 36 bytes

@(A,b)histc(A,1:b:A(end)+b)(1:end-1)

Try it online!

Out hunting Easter eggs and making a bonfire. I'll add an explanation when I have the time.

Stewie Griffin

Posted 2018-03-30T11:15:25.670

Reputation: 43 471

3

Python 2, 62 bytes

I,s=input()
B=[0]*(~-I[-1]/s+1)
for i in I:B[~-i/s]+=1
print B

Try it online!

Dead Possum

Posted 2018-03-30T11:15:25.670

Reputation: 3 256

1First of all: nice answer, I've already +1-ed it (and created a port in Java, because it's quite a bit shorter than what I had). Trailing zeroes aren't allowed however (just asked OP), so I[-1]/s+1 should be ~-I[-1]/s+1 instead. – Kevin Cruijssen – 2018-03-30T14:12:24.417

@KevinCruijssen Thanks for notice! – Dead Possum – 2018-04-01T15:58:18.877

3

Perl 5 -a -i, 32 28 bytes

Give count after the -i option. Give each input element on a separate line on STDIN

$G[~-$_/$^I]--}{say-$_ for@G

Try it online!

Ton Hospel

Posted 2018-03-30T11:15:25.670

Reputation: 14 114

2This is impressive. – None – 2018-03-30T12:00:01.620

3

05AB1E, 18 bytes

θs/Å0¹vDyI/î<©è>®ǝ

Try it online!

Emigna

Posted 2018-03-30T11:15:25.670

Reputation: 50 798

I don't know 05AB1E that well, but this seems to call A.count max(A), so run time isn't linear in len(A) + len(O). Is that correct or did I get something wrong? – Dennis – 2018-03-30T15:22:29.707

@Dennis count would be O(max(A)*max(A))... so it's quadratic on the maximum of A... OP specified it had to be linear in terms of... what exactly? – Magic Octopus Urn – 2018-03-30T16:29:29.380

2@MagicOctopusUrn Your code should run in linear time in the sum of the lengths of the input and output, according to the latest revision. – Dennis – 2018-03-30T16:31:25.850

2@Dennis that seems rather arbitrary. – Magic Octopus Urn – 2018-03-30T16:33:00.637

2@MagicOctopusUrn It’s the only sensible definition for linear time for this question I think. – None – 2018-03-31T14:33:55.950

2

APL+WIN, 23 bytes

Prompts for screen input of bins then vector of integers:

+⌿<\v∘.≤b×⍳⌈⌈/(v←⎕)÷b←⎕    

Explanation:

⎕ Prompt for input

⌈⌈/(v←⎕)÷b←⎕ divide the integers by bin size, take maximum and round up for number of bins

b×⍳ take number of bins from previous step and create a vector of bin upper boundaries

v∘.≤ apply outer product to generate boolean matrix where elements of vector ≤ boundaries

<\ switch off all 1's after first 1 in each row to filter multiple bin allocations

+⌿ sum columns for the result

Graham

Posted 2018-03-30T11:15:25.670

Reputation: 3 184

2

Java 8, 75 bytes

a->b->{var r=new int[~-a[a.length-1]/b+1];for(int i:a)r[~-i/b]++;return r;}

Port of @DeadPossum's Python 2 answer, so make sure to upvote his answer!

Explanation:

Try it online.

a->b->{          // Method with integer-array and integer parameters and no return-type
  var r=new int[~-a[a.length-1]/b+1];
                 //  Result integer-array of size `((last_item-1)/bin_length)+1`
  for(int i:a)   //  Loop over the input-array
    r[~-i/b]++;  //   Increase the value at index `(i+1)/bin_length` by 1
  return r;}     //  Return the result-array

Kevin Cruijssen

Posted 2018-03-30T11:15:25.670

Reputation: 67 575

2

C++ (gcc), 90 83 bytes

auto f(auto i,int s){typeof i j;for(auto v:i)--v/=s,j.resize(v+1),j[v]++;return j;}

Try it online!

G. Sliepen

Posted 2018-03-30T11:15:25.670

Reputation: 580

2

JavaScript (ES6), 60 bytes / O(len(a)+max(a)/n)

Saved 5 bytes thanks to @Neil

Takes input in currying syntax (a)(n).

a=>n=>[...a.map(x=>o[x=~-x/n|0]=-~o[x],o=[])&&o].map(n=>~~n)

Try it online!

Or just 43 bytes / O(len(a)) if empty elements are allowed.

Arnauld

Posted 2018-03-30T11:15:25.670

Reputation: 111 334

[...o].map(n=>n|0) gets the first output from the second solution in fewer bytes. – Neil – 2018-03-30T20:25:53.067

@Neil Not sure why I went for something so convoluted. :-/ – Arnauld – 2018-03-30T21:01:51.347

2

Ruby, 60 bytes

->a,b{(b...a[-1]+b).step(b).map{|i|a.count{|n|n<=i&&n>i-b}}}

Try it online!

Reinstate Monica -- notmaynard

Posted 2018-03-30T11:15:25.670

Reputation: 1 053

2

Haskell, 63 75 70 bytes

l!n=l#[n,2*n..]
[]#_=[]
l#(b:i)|h<-length$takeWhile(<=b)l=h:drop h l#i

Whoops, this shorter one isn't linear but quadratic;

l!n=l#[n,2*n..]
[]#_=[]
l#(b:i)=sum[1|a<-l,a<=b]:[a|a<-l,a>b]#i

Try it online!

Angs

Posted 2018-03-30T11:15:25.670

Reputation: 4 825

1

C (gcc), 102 90 89 86 bytes

#define P!printf("%d ",k)
i,j,k;f(s){for(i=s;~scanf("%d",&j);k++)for(;j>i;i+=s)k=P;P;}

Try it online!

Thanks to Kevin Cruijssen for slashing off 12 bytes, and ceilingcat for another 4 bytes!

G. Sliepen

Posted 2018-03-30T11:15:25.670

Reputation: 580

190 bytes by using for-loops, removing the int, and changing ==1 to >0. – Kevin Cruijssen – 2018-03-30T12:39:03.310

You're welcome. Btw, just a note, it's currently invalid (just like my now deleted Java answer was). It's required to run in O(n) time, so you can't have any nested for-loops.. (Your C++ answer seems fine, though. So I've +1-ed that one. :) ) – Kevin Cruijssen – 2018-03-30T13:27:14.470

It is still O(n). The inner loop will always print a value, and we only print (max value + 1) / binsize values in total. – G. Sliepen – 2018-03-30T13:49:58.600

Hmm, but isn't the outer loop already O(n) by looping over the input items. Even if the inner loop would only loop 2 times it's already above O(n). Or am I misunderstanding something.. I must admit O-times aren't really my expertise.. – Kevin Cruijssen – 2018-03-30T13:52:45.987

1But the inner loop doesn't iterate over all input elements, it only iterates over however many output values it needs to print to "catch up" to the position corresponding to the latest input element. If the input vector consists of lots of duplicates or values that differ less than the bin size, the inner loop will not perform any iterations at all. On the other hand, if the input vector is very sparse, then the inner loop will perform more iterations, printing 0's. So to be correct, the code runs in O ((number of input elements) + (last element / bin size)) time. This is still linear. – G. Sliepen – 2018-03-30T14:19:31.397

Ah, you're completely right. My original Java answer that I now edited was also O(n + last_element/bin_size). When I read "linear time" and looked it up I saw O(n) and figured they were the same. O(n + last_element/bin_size) is indeed still lineair, just another linear than O(n). Thanks for the explanation! – Kevin Cruijssen – 2018-03-30T14:25:33.157

1

Pyth, 23 22 bytes

Jm/tdeQhQK*]ZheJhXRK1J

Try it here

Jm/tdeQhQK*]ZheJhXRK1J
Jm/tdeQhQ                 Find the bin for each time and save them as J.
         K*]ZheJ          Create empty bins.
                 XRK1J    Increment the bins for each time within them.
                h         Take the first (because mapping returned copies).

user48543

Posted 2018-03-30T11:15:25.670

Reputation:

1

Ruby, 53 50 bytes

Edit: -3 bytes by iamnotmaynard.

->a,b{(0..~-a.max/b).map{|i|a.count{|x|~-x/b==i}}}

Try it online!

Kirill L.

Posted 2018-03-30T11:15:25.670

Reputation: 6 693

This does not work when a.max is not a multiple of b (e.g. f[[1,1,1,2,7,10],3] => [4, 0, 1] but should give [4, 0, 2]). I had tried the same approach. – Reinstate Monica -- notmaynard – 2018-03-30T22:29:00.963

(or rather [4, 0, 1, 1]) – Reinstate Monica -- notmaynard – 2018-03-30T22:40:08.697

Well, this is fixable by switching to a float max range value, but I'll also ask OP to clarify this in the task description. – Kirill L. – 2018-03-31T08:34:15.243

50 bytes – Reinstate Monica -- notmaynard – 2018-03-31T12:29:57.253

Nice, even better, thanks. – Kirill L. – 2018-03-31T14:57:17.057

1

This puzzle is essentially a Count-sort. We don't know the length of output without going through input first.

C (clang), 53 bytes

i,j;f(*A,l,b,*O){for(j=0;j<l;O[(A[j++]+b-1)/b-1]++);}

Try it online!

This solution takes following parameters:
A input array
l length of A
b bin_size
O storage for Output. Must be sufficient length
and returns output in O.

This solution has a handicap: it doesn't return the length of output array O, and so caller doesn't know how much to print.

Following version overcomes that handicap:

C (clang), 79 bytes

i,j,k;f(*A,l,b,*O,*m){for(k=j=0;j<l;O[i=(A[j++]+b-1)/b-1]++,k=k>i?k:i);*m=++k;}

Try it online!

It takes an additional parameter m and returns length of O in it. It cost me 26 bytes.

GPS

Posted 2018-03-30T11:15:25.670

Reputation: 341