Count the reviews!

7

Introduction and Credit

Imagine you're Amazon. Imagine you have a large number of products for which a lot of people have written reviews scoring 1-5 stars. Now you need to give your customers a summary for each product based on the simple product-scoring association. Because you're Amazon, time is money and as such you need to fit this into the smallest code possible because that's the code you can write the fastest.
Credit: This task appeared in 2014 in a programming mid-term exam in the functional programming course at my university and since then an example solution has been provided. I have the lecturer's and author's (same person) permission to post this question publicly.


Input

Your input will be a list of pairs of natural numbers, where the second value of each pair is always either 1, 2, 3, 4 or 5. Take the input using your preferred, generally accepted way.

Output

Your output will be a list of pairs of a number and a list of 5 numbers. Give the output using your preferred, generally accepted way.

What to do?

Given the (unsorted) list of product-id and scoring pairs, count for each product how often each score appears and return the amounts in the second list of the output pair at the appropriate position.
So for product #5 with 10x 1-star, 15x 2-star, 25x 3-star, 50x 4-star and 150x 5-star would result in the entry [5 [10 15 25 50 150]].

Potential Corner Cases

The input list may be empty in which case your output should be as well.
You don't need to consider the case where the input violates the above constraints.
You can safely assume all numbers to be representable by a signed 32-bit integer.
If you have no review(s) for a product (ie it doesn't appear in the output) it must not appear in the output either, i.e. 5x zeroes doesn't happen and can always be dropped.

Who wins?

This is so the shortest answer in bytes wins. Built-ins are allowed and standard rules apply.


Examples

Example input:

[(1,4),(2,5),(1,5),(1,1),(1,5),(1,4)]

Example corresponding output:

[
(1,[1, 0, 0, 2, 2]),
(2,[0, 0, 0, 0, 1])
]

Other test vectors:

[] -> []
[(1,1),(2,2),(3,3),(4,4),(5,5)] ->
[(1, [1,0,0,0,0]),(2, [0,1,0,0,0]),(3, [0,0,1,0,0]),(4, [0,0,0,1,0]),(5, [0,0,0,0,1])]

SEJPM

Posted 2016-12-05T21:15:55.670

Reputation: 3 203

Can / must we output a list of 5 zeros for product 2 if only products 1 are given in the input? Or can we assume that will never happen? – Luis Mendo – 2016-12-05T22:53:40.207

@LuisMendo if you have no reviews for a product "it doesn't exist". Or in another way: If you have no review then there's no way for you to tell product 2 or 3 or 4 is a thing and thus you can obviously leave it out. Also see the edit. – SEJPM – 2016-12-05T22:57:56.210

Would a JS object such as {"1":[1,0,0,2,2],"2":[0,0,0,0,1]} be a valid output? – Arnauld – 2016-12-06T00:15:22.323

@Arnauld well, you pair the item number with a list so it's OK. – SEJPM – 2016-12-06T08:52:12.850

Answers

2

MATL, 8 bytes

5B#uilZ?

Input format is a column array of products followed by a column array of scores:

[1; 2; 1; 1; 1; 1]
[4; 5; 5; 1; 5; 4]

Output format is a vertical array of products followed by a matrix of score occurrences:

1
2
1 0 0 2 2
0 0 0 0 1

Try it online!

Explanation

5B#u   % Implicitly input array of products. Push unique values and integer labels
       % of the input entries. So input [1; 3; 1; 1; 1] would give outputs [1; 3]
       % and [1; 2; 1; 1; 1]. The latter is lik the input but without "holes", i.e.
       % uses consecutive integers. This is required to avoid rows of zeros in the
       % result for non-existing intermediate products
i      % Input array of scores
l      % Push 1
Z?     % Build sparse matrix with those row indices, column indices, and value 1.
       % Implicitly display

Luis Mendo

Posted 2016-12-05T21:15:55.670

Reputation: 87 464

2

R, 64 bytes

function(m)`if`(length(m)>1,by(m[,2],m[,1],tabulate,n=5),list())

This solution works if the I/O is somewhat flexible. It's an unnamed function that works by taking an R-matrix as input and tabulates the frequency of each type of review. The only problem is dealing with the empty list. We need to introduce a control whether this is actually the case and because an empty matrix matrix() is not actually empty, we return an empty list() instead.

Try it on R-fiddle

Example output for the last test case:

m[, 1]: 1
[1] 1 0 0 0 0
--------------------------------------------------------------------------------------------- 
m[, 1]: 2
[1] 0 1 0 0 0
--------------------------------------------------------------------------------------------- 
m[, 1]: 3
[1] 0 0 1 0 0
--------------------------------------------------------------------------------------------- 
m[, 1]: 4
[1] 0 0 0 1 0
--------------------------------------------------------------------------------------------- 
m[, 1]: 5
[1] 0 0 0 0 1

Billywob

Posted 2016-12-05T21:15:55.670

Reputation: 3 363

Nice, didn't know you could use 'if' that way. – JAD – 2016-12-06T09:26:07.260

@JarkoDubbeldam I was surprised too when I found out. It also appears to be completely undocumented. But it basically works as a non-vectorized ifelse() – Billywob – 2016-12-06T09:29:30.493

I knew that you could use it on binary operators: 1+1 -> '+'(1,1), but this is more useful. – JAD – 2016-12-06T09:34:28.797

2

Python 2 - 88 72 71 69 67 bytes

def s(l,d={}):
 for p,s in l:d.setdefault(p,[0]*5)[s-1]+=1
 print d
Input to the function:
[(1,4),(2,5),(1,5),(1,1),(1,5),(1,4)]

Output:
{1: [1, 0, 0, 2, 2], 2: [0, 0, 0, 0, 1]}

Edits:

  • Thanks to nedla2004 for saving 16 bytes!
  • Changed to Python 2 to save 1 byte :D
  • Saved 3 bytes by taking newline byte count as 1 byte.
  • Thanks flp-tkc for saving 2 bytes. (There was an unnecessary extra space too :D)

Golfing suggestions are welcome :)

Gurupad Mamadapur

Posted 2016-12-05T21:15:55.670

Reputation: 1 791

You could put the forth line on the same line as the for loop. – nedla2004 – 2016-12-06T13:51:49.553

@nedla2004 Thanks.I did not know that we do that with for loops. – Gurupad Mamadapur – 2016-12-06T14:23:06.437

Not sure, but you might be able to print d instead of returning it. – nedla2004 – 2016-12-06T16:14:40.700

Yea, print(d) also works. I'll leave it as it is, because of no byte count change. – Gurupad Mamadapur – 2016-12-06T16:23:04.210

1Does this work in Python 2? I don't see why not. – nedla2004 – 2016-12-06T16:25:33.147

Now that you are using Python 2, making this a full program would be shorter. – nedla2004 – 2016-12-08T01:08:22.927

1Could you save bytes using def s(l,d={})? – FlipTack – 2016-12-17T08:47:08.120

1

Mathematica, 68 bytes

{#[[1,1]],Last/@Tally@Join[Range@5,#2&@@@#]-1}&/@GatherBy[#,#&@@#&]&

Unnamed function taking as input a list of ordered pairs of integers such as {{1, 4}, {2, 5}, {1, 5}, {1, 1}, {1, 5}, {1, 4}}. To start, GatherBy[#,#&@@#&] groups together all the pairs with the same first element (which can be any identifier, actually, not just an integer).

Each of those groups is then operated on by the function {#[[1,1]],Last/@Tally@Join[Range@5,#2&@@@#]-1}, which produces a two-element list as output, the first element of which is just the common identifier. Join[Range@5,#2&@@@#] generates a list of integers starting with 1,2,3,4,5 and ending with all of the scores corresponding to the given identifier. Last/@Tally then computes how many time each score was found, returning the appropriate 5-tuple; then -1 makes up for the fact that we introduced each score one extra time, leaving zeros for any missing score (which otherwise would be skipped completely).

Greg Martin

Posted 2016-12-05T21:15:55.670

Reputation: 13 940

1

Jelly, 18 17 bytes

Fm2QṢµ5,@þċ@€€³ż@

TryItOnline!

How?

Fm2QṢµ5,@þċ@€€³ż@ - Main link: List of (id, score) pairs
F                 - flatten
 m2               - mod 2 of list (gets the IDs)
   Q              - unique items by first appearance (one of each ID)
    Ṣ             - sorted (IDs in order)
     µ            - monadic chain separation (call the above z)
         þ        - table the result of
       ,@         -    pair z (with reversed @rguments) with
      5           -    5 (implicit range, so [1,2,3,4,5]) ...we now have:
                  -        [[id,1],[id,2],[id,3],[id,4],[id,5]] for every id, in order.
          ċ@€€    - count occurrences of (with reversed @rguments) for €ach for €ach in
              ³   - the first input (the list of (id, score) pairs)
               ż@ - zip (with reversed @rguments) with z

Jonathan Allan

Posted 2016-12-05T21:15:55.670

Reputation: 67 804

1

JavaScript (ES6), 58 bytes

Returns an object whose keys are the product IDs, associated to a list of review scores.

a=>a.map(([i,n])=>(o[i]=o[i]||[0,0,0,0,0])[n-1]++,o={})&&o

Test cases

let f =

a=>a.map(([i,n])=>(o[i]=o[i]||[0,0,0,0,0])[n-1]++,o={})&&o

console.log(JSON.stringify(f([])));
console.log(JSON.stringify(f([[1,4],[2,5],[1,5],[1,1],[1,5],[1,4]])));
console.log(JSON.stringify(f([[1,1],[2,2],[3,3],[4,4],[5,5]])));

Arnauld

Posted 2016-12-05T21:15:55.670

Reputation: 111 334

0

Clojure, 79 bytes

(fn[c](reduce(fn[r[i v]](update r i #(update(or %[0 0 0 0 0])(dec v)inc))){}c))

Must called with a sequence of vectors so that item id and ratinv value are correctly mapped to function arguments. Returns a hash-map of vectors.

(def f (fn[c]( ... )))
(f [[1 4] [2 5] [1 5] [1 1] [1 5] [1 4]])
{1 [1 0 0 2 2], 2 [0 0 0 0 1]}

NikoNyrh

Posted 2016-12-05T21:15:55.670

Reputation: 2 361