Split a list into sized chunks, but not counting items failing predicate

17

4

Motivation: Sometimes certain items in a list don't count towards your totals. For example, counting plane passengers in rows, where babies sit on a parent's laps.

Challenge: write a program to split a list of items into chunks. Each chunk (except possibly the last) is the same size, where size is defined as the number of items passing a predicate function.

Rules:

  1. Your program must take
    • a list of items
    • a positive integer chunk size
    • a predicate function (takes an item, and returns true or false)
  2. You must return the input list split into chunks
  3. Each chunk is a list of items
  4. Overall the items must remain in the same order, with none disgarded
  5. The number of items passing predicate in each chunk (except possibly the last) should match the input chunk size.
  6. items failing the predicate should not count toward this size
  7. Items failing the predicate are
    1. still included in the output chunks
    2. allocated to the earliest chunk, in the case that a chunk is "full" but the next items are ones failing the predicate
      • thus the final chunk may not consist solely of items failing the predicate
  8. The final chunk may be of size less than the chunk size because all the items have been accounted for.

Non exhaustive examples:

The simplest example is to consider 1s and 0s, where the predicate function is x ==> x > 0. In this case, the sum of each chunk must match the chunk size.

  • items: [], size: 2, predicate: x > 0 --> either [] or [[]]
  • items: [0, 0, 0, 0, 0, 0], size: 2, predicate: x > 0 --> [[0, 0, 0, 0, 0, 0]]
  • items: [0, 1, 1, 0], size: 2, predicate: x > 0 --> [[0, 1, 1, 0]]
  • items: [0, 1, 1, 0, 1, 0, 0], size: 2, predicate: x > 0 --> [[0, 1, 1, 0], [1, 0, 0]]
  • items: [0, 1, 0, 0, 1, 0, 1, 1, 0], size: 2, predicate: x > 0 --> [[0, 1, 0, 0, 1, 0], [1, 1, 0]]

And let's finish with the plane passengers where babies sit on a parent's lap example. A for adult, b for baby, plane row is 3 seats wide, adult always listed before their baby:

  • items: [A, b, A, b, A, A, A, b, A, b, A, A, b], size: 3, predicate: x => x == A --> [[A, b, A, b, A], [A, A, b, A, b], [A, A, b]]

Tom Viner

Posted 2018-07-29T22:35:45.863

Reputation: 433

6This looks like a good question, but currently it's lacking a winning criterion. I suggest using [tag:code-golf]. – Laikoni – 2018-07-29T22:56:34.553

3Can we assume the list items are single characters? Or, say, numbers? – xnor – 2018-07-29T23:04:13.670

[tag:chunking] sounds interesting, though could maybe be replaced by [tag:set-partitions]. – Jonathan Frech – 2018-07-29T23:28:33.613

@Laikoni tagged [tag:code-golf] – Tom Viner – 2018-07-30T00:32:44.927

@xnor list items can be any basic data-type like strings or numbers – Tom Viner – 2018-07-30T00:34:08.853

@JonathanFrech [tag:set-partitions] are not appropriate as the order must be maintained – Tom Viner – 2018-07-30T00:34:54.307

@ChasBrown also hoping this can be opened! I don't think there's any point in empty chunks, but either [] or [[]] would meet the criteria, I'll update the example. – Tom Viner – 2018-07-30T00:36:28.903

"The final chunk may be of size less than the chunk size" -> What are the two sizes? We're given the chunk size, but the implication is that there's a different, other size. – Οurous – 2018-07-30T01:00:06.003

1@Ourous I've added "because all the items have been accounted for", i.e. the last chunk doesn't get a chance to get "full up", because that's just the end of the input items. – Tom Viner – 2018-07-30T01:09:05.557

When dealing with string items, can we assume that they are not empty? – Arnauld – 2018-07-30T07:04:53.550

@Arnauld, no I think the items aren't restricted like that. They can be 0, 0.0, "", or any other basic data type. – Tom Viner – 2018-07-30T08:16:43.203

Can we give an empty/falsey output for when input is [], instead of [] or [[]]? – sundar - Reinstate Monica – 2018-07-30T08:36:11.343

General question, has there been a consensus on how to do these "blackbox predicate function" challenges in languages that don't have user-defined functions, for eg., MATL? I tried to search meta for this, but my search-fu seems weak today. – sundar - Reinstate Monica – 2018-07-30T16:07:51.043

@sundar: here's some meta.

– nimi – 2018-07-30T21:51:35.300

@nimi That answer still assumes the language has user-defined functions, just not first-class ones. MATL has only builtin functions, no user-defined functions whether first class or not. There's a workaround to call out to the surrounding environment though, so I've posted an answer using that. Thanks for the link though! – sundar - Reinstate Monica – 2018-07-31T18:13:03.490

Answers

2

Jelly, 10 bytes

vⱮTm⁵ḊœṖŒṘ

A full program taking the monadic black-box function as the first optional argument, the list as the second optional argument and the chunk size as the third optional argument which prints a Python representation of the resulting list of lists (to avoid Jelly's implicit smashing of lists containing characters).

Try it online! (note that a list of characters is passed to a Jelly program by formatting it as a Python quoted string)

How?

vⱮTm⁵ḊœṖŒṘ - Main Link: B, L, S
 Ɱ         - map across second argument with (i.e. for X in L):
v          -   evaluate as a monad with input (i.e. B(X))
  T        - truthy indices (e.g. [0,0,1,0,1,1,1,0,0,0,1,0,0]T -> [3,5,6,7,10])
    ⁵      - 3rd optional argument = S
   m       - modulo slice   (e.g. [3,4,7,9,12,15,16,18,19,20]m3 -> [[3,4,7],[9,12,15],[16,18,19],[20]]
     Ḋ     - dequeue        (e.g. [[3,4,7],[9,12,15],[16,18,19],[20]]Ḋ -> [[9,12,15],[16,18,19],[20]]
      œṖ   - partition right (L) at the indices in that
        ŒṘ - print Python representaion

Jonathan Allan

Posted 2018-07-29T22:35:45.863

Reputation: 67 804

8

Brachylog, 37 bytes

hW&t~c.k{↰₂ˢl}ᵐ;WxĖ∧.bhᵐ↰₂ᵐ∧.t↰₂ˢl≤W∧

Try it online!

I was pleasantly surprised to find that this - pretty much a restatement of the question - successfully terminates and produces correct output.

Assumes the predicate is present as predicate 2 below this code. Outputs a list of lists ("chunks"), or false for an empty input.

Explanation:

hW&               % First input is W, the expected "weight" of each chunk
                  %  (i.e. the number of items passing predicate in each chunk)

t                 % Take the second input, the list of items
 ~c.              % Output is a partition of this list
    k{    }ᵐ      % For each partition (chunk) except the last, 
      ↰₂ˢ         %   Select the items in the chunk that pass the predicate
         l        %   Get the length of that
                  % (So now we have the list of the "weights" of each chunk)
            ;Wx   % Remove the input expected weight from this list, and 
               Ė  %  the result of this should be empty.
                  %  This verifies that the list of weights is either 
                  %  composed of all W-values, or is empty (when input is [0 0 0] for eg.)

    ∧.bhᵐ↰₂ᵐ      % And, the first element of each chunk (except the first) should
                  %  pass the predicate. This is another way of saying:
                  %  "Items failing the predicate are allocated to the earliest chunk"

    ∧.t↰₂ˢl≤W     % And, the final chunk (which we haven't constrained so far)
                  %  should have weight ≤ the input expected weight
                  %  This disallows putting everything in the final chunk and calling it a day!

    ∧             % (no further constraints on output)

sundar - Reinstate Monica

Posted 2018-07-29T22:35:45.863

Reputation: 5 296

7

Apl (Dyalog Unicode) 17 16 bytes (SBCS)

Thanks to Adám for saving me 1 byte.

w⊆⍨⌈⎕÷⍨1⌈+\⎕¨w←⎕

Try it online! for explaination purposes I will leave up the 17 byte solution.

{⍵⊆⍨⌈⍺÷⍨1⌈+\⍺⍺¨⍵}

⍺⍺¨⍵ aplies the predicate to the list returning a boolean vector
+\ generates a running total
1⌈ replaces leading 0s with 1s
⌈⍺÷⍨ divides each element by the chunk size and rounds up
⍵⊆⍨ partitions the original vector by this

jslip

Posted 2018-07-29T22:35:45.863

Reputation: 721

2That's impressive! And I like the output display, appropriate visual for the problem. – sundar - Reinstate Monica – 2018-07-30T12:41:37.723

Save a byte by converting to program (tradfn body): w⊆⍨⌈⎕÷⍨1⌈+\⎕¨w←⎕

– Adám – 2018-08-02T17:54:39.123

5

Clean, 96 92 bytes

Uses a named function f :: a -> Bool allowed according to meta consensus.

import StdEnv,StdLib
$l n|l>[]=last[[i: $t n]\\i<-inits l&t<-tails l|n>=sum[1\\e<-i|f e]]=[]

Try it online!

Expanded (with default highlighting to make comments show up):

$ l n // define function $ on `l` and `n`
 | l > [] // if `l` is not the empty list
  = last [ // the last element of ...
                   \\ i <- inits l // prefixes of `l`
                    & t <- tails l // matching suffixes of `l`
                    | n >= // where n is greater than or equal to
                           sum [1 \\ e <- i | f e] // the number of matching elements in the prefix
          [i: $t n] // prepend that prefix to the application of $ to the rest of the list
         ]
 = [] // if `l` is empty, return empty

Οurous

Posted 2018-07-29T22:35:45.863

Reputation: 7 916

4

Java 10, 207 186 159 148 bytes

a->n->{var r=new java.util.Stack();int l=a.size(),i=0,c=0,j=0;for(;i<=l;i++)if(i==l||f(a.get(i))&&++c>n&i>0){r.add(a.subList(j,j=i));c=1;}return r;}

Java is definitely not the right language for this challenge (or any codegolf-challenge of course..)

-21 bytes thanks to @O.O.Balance

Try it online.

Explanation:

a->n->{                    // Method with List & int parameters & List of Lists return-type
  var r=new java.util.Stack();
                           //  Result-list, starting empty
  int l=a.size(),          //  Size of the input-List
      c=0,                 //  Count-integer, starting at 0
      j=0,                 //  Range-integer, starting at 0
  i=0;for(;i<=l;i++){      //  Loop `i` in the range [0, `l`]
    if(i==l||              //   If this is the last iteration
       f(a.get(i))         //   Or if the black-box function is true for the current item
       &&++c               //    Increase the counter by 1
        >n&                //    If the counter is now larger than the size
        &i>0){             //    and it's not the first item of the List
      a.subList(j,j=i);    //     Add a sub-List to the result from range [`j`, `i`)
                           //     And set `j` to `i` at the same time
      c=1;}                //     And reset `c` to 1
  return r;}               //  Return the List of Lists as result

Black box input format:

Assumes a named function boolean f(Object i) is present, which is allowed according to this meta answer.

I have an abstract class Test containing the default function f(i), as well as the lambda above:

abstract class Test{
  boolean f(Object i){
    return true;
  }

  public java.util.function.Function<java.util.List, java.util.function.Function<Integer, java.util.List<java.util.List>>> c =
    a->n->{var r=new java.util.Stack();int l=a.size(),i=0,c=0,j=0;for(;i<=l;i++)if(i==l||f(a.get(i))&&++c>n&i>0){r.add(a.subList(j,j=i));c=1;}return r;}
  ;
}

For the test cases, I overwrite this function f. For example, the last test case is called like this:

System.out.println(new Test(){
  @Override
  boolean f(Object i){
    return (char)i == 'A';
  }
}.c.apply(new java.util.ArrayList(java.util.Arrays.asList('A', 'b', 'A', 'b', 'A', 'A', 'A', 'b', 'A', 'b', 'A', 'A', 'b'))).apply(3));

Kevin Cruijssen

Posted 2018-07-29T22:35:45.863

Reputation: 67 575

1"(or any codegolf-challenge of course..)" ehh I dunno, you've beaten out my Clean answers in at least a few cases. Anyway, I always look forward to your answers. – Οurous – 2018-07-30T08:27:46.267

2

@Οurous It's more of a meme that Java is not suitable for codegolf in any way, which I guess applies to Clean as well if we compare it to actual golf languages like Jelly, 05AB1E, etc. I still enjoy solving all these codegolf-challenges in Java though (and you in Clean as well I assume). And once in a (long) while, Java is able to beat Python. ;) And I was once a leading answer with Java, until bash ruined it (and R golfed further). xD

– Kevin Cruijssen – 2018-07-30T08:30:26.753

1

186 bytes if you return a List of arrays: https://bit.ly/2mSjCIc

– O.O.Balance – 2018-07-30T11:02:41.213

@O.O.Balance Thanks! Smart use of Arrays.copyOfRange! – Kevin Cruijssen – 2018-07-30T11:43:38.073

@O.O.Balance been able to golf a bit more by using taking the input as List and using .sublist. Your functionality remains the same apart from that, but it saves a lot of bytes and removes the import. (And now it also works for the test-case with characters instead of integers.) – Kevin Cruijssen – 2018-07-30T11:59:18.553

4

R, 58 bytes

function(v,g,n,a=(cumsum(Map(g,v))-1)%/%n)split(v,a*(a>0))

Try it online!

  • -5 bytes thanks to @Giuseppe

digEmAll

Posted 2018-07-29T22:35:45.863

Reputation: 4 599

158 bytes – Giuseppe – 2018-07-30T20:43:21.593

3

C (gcc), 70 66 bytes

I use a structure to note the start of a sub-list, as C doesn't know about such things.

Thanks to ceilingcat for the suggestions.

t;f(a,s,c)l*a;int(*c)();{for(;a->v;a++)(t+=c(a->v))>s?t=++a->s:0;}

Try it online!

ErikF

Posted 2018-07-29T22:35:45.863

Reputation: 2 149

3

Haskell, 72 bytes

p#s|let l@(h:t)!a|sum[1|e<-h:a,p e]>s=a:l![]|n<-a++[h]=t!n;_!a=[a]=(![])

Try it online!

p#s     = (![])         -- main function that takes a predicate function 'p',
                        -- a size 's' and a input list without a name that is
                        -- directly passed as the first argument to function '!'
  let  l@(h:t)!a        -- function '!' is a local function (so that 'p' and 's'
                        -- are in scope). Takes a list 'l' of at least one element
                        -- 'h' (and 't' being the rest of the list) and an
                        -- accumulator 'a'
   |sum[1|e<-h:a,p e]>s -- if 'p' holds for more than 's' elements in 'h' and 'a'
     =a:l![]            --   put 'a' in front of a recursive call with the same list
                        --   'l' and an empty accumulator
   |n<-a++[h]           -- else bind 'n' to 'h' appended to 'a' and
     =t!n               --   call '!' again with 't' and 'n'
  _!a=[a]               -- base case for '!'. If the input list is empty, return
                        --   a singleton list with 'a' 

nimi

Posted 2018-07-29T22:35:45.863

Reputation: 34 639

3

MATL, 19 bytes

HyX$Ysi/Xk1y>+!w7XQ

Based on jslip's excellent APL answer.

MATL doesn't really have user-defined functions as such, but it does have a way to call out to the environment it's running on (MATLAB/Octave), so this uses that for the predicate function. Usage would be something like this, but that functionality is disabled online for security reasons, so here's a version that uses a hardcoded isodd predicate function instead: Try it on MATL Online.

H    % Push the function name to be called, assumed to be 
     %  stored in the H clipboard
y    % Take the first input, push copies of it both below and above the 
     %  function name
X$   % Call the function (say 'isupper') with the input as argument
     %  returns a boolean vector
Ys   % Cumulative sum
i/   % Take the chunk size and divide each element by it
Xk   % ceil
1y>+ % Turn the (initial) 0s to 1s
!    % Transpose. Now we have a column vector specifying which chunk 
     %  each input element should go into
w    % Bring the other copy of the input on top 
7    % Code for this function: '@(x){x.'}'
     %  i.e. place each chunk in a row vector and enclose it in a cell
XQ   % 'accumarray' - split the input into chunks based on
     %   the given column vector, apply the given function to each chunk
     %   (which here just wraps it up as a cell), and accumulate the results
     %   in an array.
     % implicit output

sundar - Reinstate Monica

Posted 2018-07-29T22:35:45.863

Reputation: 5 296

2

JavaScript (ES6), 69 bytes

Saved 3 bytes thanks to @tsh

Takes input in currying syntax (size)(predicate)(array).

s=>p=>g=a=>a.every(x=>p(x)?k--:++j,j=k=s)?[a]:[a.splice(0,j),...g(a)]

Try it online!

Arnauld

Posted 2018-07-29T22:35:45.863

Reputation: 111 334

@tsh Nice optimization. Thanks! – Arnauld – 2018-07-30T07:49:34.850

2

Ruby, 57 bytes

->a,n,g{c=-1;a.group_by{|x|[0,c+=g[x]?1:0].max/n}.values}

Try it online!

Anonymous lambda accepting input array a, chunk size n, and predicate g. Maintains a counter c of items matching predicate and groups items by the number of chunks already used up. Unfortunately, the initial value -1/n doesn't round up to 0, so we have to spend a few bytes to fix it.

Kirill L.

Posted 2018-07-29T22:35:45.863

Reputation: 6 693

2

R, 100 bytes

function(L,K,P,s=sapply(L,P),y=cut(cumsum(s),seq(0,sum(s),K),,T))for(l in levels(y))cat(L[y==l],"
")

Try it online!

outgolfed handily by digEmAll

Giuseppe

Posted 2018-07-29T22:35:45.863

Reputation: 21 077

I don't know if named lists as output are ok (and if I missed some edge cases...): 65 bytes

– digEmAll – 2018-07-30T19:09:45.000

Hmm well I'd post that as a separate answer! – Giuseppe – 2018-07-30T19:31:03.060

1

Python 2, 92 bytes

lambda a,c,p:reduce(lambda r,v:[r[:-1]+[r[-1]+[v]],r+[[v]]][sum(map(p,r[-1]+[v]))>c],a,[[]])

Try it online!

Chas Brown

Posted 2018-07-29T22:35:45.863

Reputation: 8 959

1

JavaScript (Node.js), 90 bytes

(s,p,r=[l=[]],n=s+1)=>g=a=>0 in a?g(a,(n=(n||s)-p(v=a.shift()))||r.push(l=[]),l.push(v)):r

Try it online!

Invoke as F(2, x => x > 0)([0, 1, 1, 0])

tsh

Posted 2018-07-29T22:35:45.863

Reputation: 13 072

1

Mathematica, 82 bytes

f[l_,s_,p_]:=Last@Reap[i=t=-1;Do[If[p@e,If[++i~Mod~s==0&&i>0,t++]];e~Sow~t,{e,l}]]

Ungolfed:

f[l_,s_,p_] :=                (* define a function that takes three    *)
                              (* arguments                             *)

  Last@Reap[                  (* collect and return results which were *)
                              (* gathered by the procedural loop below *)

    i=t=-1;                   (* initialize two counters               *)

    Do[                       (* start iterating over the list         *)

      If[p@e,                 (* if element passes predicate           *)
        If[                   (* then if preincremented i is 0 modulo  *)
          ++i~Mod~s==0&&i>0,  (* chunk size and i > 0                  *)
          t++                 (* increment t                           *)
        ]
      ];e~Sow~t,              (* finally sow the element tagged with   *)
                              (* the current value of t                *)

    {e,l}]                    (* e references current element of list  *)
                              (* l is the list itself                  *)
  ]

l is the input list; s is chunk size; p is an unnamed/anonymous/pure/lambda function that returns true/false operating on elements of the list.

Last@Reap[...] returns a list of lists of every element that was Sow-n inside of .... They are grouped into sublists by the second argument of e~Sow~t which is shorthand for Sow[e, t].

I had to initialize counters to -1 to handle a chunk size of 1, otherwise I would need to check Mod[i, s] (i~Mod~s) equal to 1, which could never happen.

The rest of the code is explained in the ungolfed block.

LLlAMnYP

Posted 2018-07-29T22:35:45.863

Reputation: 361