Meeting splitter

4

Introduction

Large meetings waste working time. Our aim is to split a large meeting into a number of smaller meetings, so the average number topics that a person discusses is minimized but every person has to be in every meeting they are interested in.

Input

Given is a function from a set of topics \$T\$ to a set of persons \$P\$ that are interested in this topic $$f:T \rightarrow 2^P$$ and maximum partition size \$m\$.

This is encoded as an array of arrays, where element \$f(i)\$ contains the topics of person \$i\$.

Test Case

m = 3
f = [
[9,7],
[3,7,6,4],
[3,1,2],
[7,5,4,2],
[6,2,8,1,4],
[3,7,2,9,8],
[4,3,2],
[3,4,6],
[1,9,5,3],
[6,4,1,2],
[9,7,3,1],
[1,7,4],
[5,1,7,2],
[5,2,7],
[4,2,6,9],
[6,4,1],
[6,5,9,8],
[9,1],
[9,1,5,6],
[1,2,4,8,7]                                                                     
]

Output

Find a partition of \$T\$ (a split into subsets that are disjoint and that cover the whole set) into $$X = {T_1, T_2, \ldots, T_n}$$ where \$n\le m\$, so that the following term is minimized: $$r = \sum_{T' \in X}{(|(\bigcup_{t \in T'} f(t))|\times |T'|)} $$

To help understand \$r\$, if each topic takes an hour to discuss, then \$r\$ is the total amount of person hours required for the meetings.

If there are multiple partition sizes with the same \$r\$, \$n\$ should be minimized as well.

The output should contain the partition \$X\$ as a two-dimensional array, such as: $$X = [[0,1,2,3],[4,5,6],[7,8,9]] $$

Winning Condition

I am looking for the fastest algorithm that finds the absolute minimum \$r\$ and \$n\$.

If that is not possible in polynomial time, then I am also looking for an polynomial algorithm that has the lowest upper bound on \$r\$ in relation to the absolute minimum \$r\$ (e.g. double).

Konrad Höffner

Posted 2018-10-10T13:44:58.113

Reputation: 117

What would be the winning condition? Challenges here should always have a win condition. code-challenge is a valid win condition tag, but in that case you should state what the win condition is for the challenge. If I understand it correctly you are looking for a fastest-algorithm win condition instead?

– Kevin Cruijssen – 2018-10-10T13:53:45.037

1@KevinCruijssen Sorry, I should have stated that more clearly, I will edit the answer. The problem is that I don't know the complexity class of the problem and depending on it, it is either one of those. I will add the other tag and hope that is OK. – Konrad Höffner – 2018-10-10T13:56:11.643

Since you tagged it [tag:fastest-algorithm] and you don't know the complexity, you might want to remove that restriction (a competing answer will strive for a polynomial algorithm). – ბიმო – 2018-10-10T14:08:43.387

@BMO: The idea was that in that case a polynomial approximation is to be found but I agree that a non-polynomial non-approximate fastest algorithm would be interesting as well so I removed the tag and edited the objective. – Konrad Höffner – 2018-10-10T14:16:53.197

2@KonradHöffner Are we allowed to assume that the input list of interesting topics is sorted l to g? (aka treat it as a set in code as well as equation) – moonheart08 – 2018-10-10T16:56:37.807

I feel like your r function is not effective. It seems to me that splitting all the topics would give the minimum value for r. – Kroppeb – 2018-10-10T20:47:10.637

@moonheart08: Sure! You can assume it is sorted. – Konrad Höffner – 2018-10-10T21:48:05.100

2I'd like a description of the term *r* in words. You don't really describe what it is or means. – mbomb007 – 2018-10-11T00:00:14.220

@kroppeb: Indeed that would yield the minimum value of r if the maximal allowed partition size m is equal or higher than the number of topics. However we cannot assume that in the general case. – Konrad Höffner – 2018-10-11T07:05:37.570

1@mbomb007: r is the sum of topics that all persons must discuss. If each topic takes an hour to discuss, then it is the total amount of person hours required for the meetings. – Konrad Höffner – 2018-10-11T07:07:58.033

1@Konrad - that description ("total amount of person hours") is really helpful - you should add that into the question. – Toby Speight – 2018-10-11T08:49:26.183

1How is the solution not a variant of MIN_CUT in the similarity between topics as measured by participants overlap? – NofP – 2018-12-09T10:59:40.120

@NofP: I didn't know about MIN_CUT, thanks for bringing this up! But I don't quite see how this problem can be transformed to MIN_CUT, could you elaborate in an answer? – Konrad Höffner – 2018-12-10T08:46:08.860

Answers

1

R with lpSolve

Turn the problem into an integer linear programming problem with \$2^t\$ variables and \$t+1\$ constraints, where \$t\$ is the number of topics. The cost function is modified by multiplying it with \$m\$ and adding the size of the partition. This makes the size of the partition a secondary optimization goal.

require("lpSolve")

decode <- function(n) {
  res <- c()
  i <- 1
  while (n>0){
    if (n %% 2 == 1)
      res <- c(res,i)
    i <- i+1
    n <- n %/% 2
  }
  res
}

opt <- function(m,p) {
  nr_pers <- length(p)
  nr_topics <- max(unlist(p))
  x <- array(0,c(nr_topics,nr_pers))
  for (i in 1:nr_pers)
    for (j in 1:length(p[[i]]))
      x[p[[i]][[j]],i] <- 1
  l <- array(0,c(2^nr_topics,nr_pers))
  cb <- rep(0,2^nr_topics)
  w <- rep(1,2^nr_topics)
  mt <- matrix(0,nr_topics+1,2^nr_topics)
  mt[nr_topics+1,1] <- 1
  len <- 1
  for (i in 1:nr_topics) {
    for(j in 1:len) {
      j1 <- len+j
      l[j1,] <- l[j,]+x[i,]
      cb[j1] <- cb[j] + 1
      w[j1] <- m*sum(l[j1,] != 0)*cb[j1]+1
      mt[,j1] <- mt[,j]
      mt[nr_topics+1-i,j1] <- 1
    }
    len <- 2*len
  }
  dir <- c(rep("=",nr_topics),"<=")
  rhs <- c(rep(1,nr_topics),m)
  sol <- lp("min",w,mt,dir,rhs,all.bin=TRUE)$solution
  res <- list()
  j <- 1
  for (i in 1:length(sol))
    if (sol[i]==1) {
      res[[j]] <- decode(i-1)
      j <- j+1
    }
  res
}

# test case
f <- list(
       c(9,7),
       c(3,7,6,4),
       c(3,1,2),
       c(7,5,4,2),
       c(6,2,8,1,4),
       c(3,7,2,9,8),
       c(4,3,2),
       c(3,4,6),
       c(1,9,5,3),
       c(6,4,1,2),
       c(9,7,3,1),
       c(1,7,4),
       c(5,1,7,2),
       c(5,2,7),
       c(4,2,6,9),
       c(6,4,1),
       c(6,5,9,8),
       c(9,1),
       c(9,1,5,6),
       c(1,2,4,8,7))

On debian, this needs the package r-cran-lpsolve. The code assumes that the topics start with 1 and that each topic number up to the maximum is used. Input and output is as list of vectors, which is not too nice. The test case example is defined in the code, but the list can of course also be entered directly.

To use it, start R and then source the code file and use the function like this:

> source("opt.r")
Loading required package: lpSolve
> opt(3,f)
[[1]]
[1] 1 2 3 4

[[2]]
[1] 6 8

[[3]]
[1] 5 7 9

> 

So [[1,2,3,4],[6,8],[5,7,9]] is an optimal partition for the test case.

Christian Sievers

Posted 2018-10-10T13:44:58.113

Reputation: 6 366

Wow thanks, I didn't think about ILP at all! What is the complexity of that approach? How long did it run? I can only test it later but accepted it already as the bounty is running out. – Konrad Höffner – 2018-12-12T09:29:23.333

1I think you can and should handle the bounty and acceptance separately. The preparation step is exponential in the number of topics, linear in the number of persons and independent from m. The example test case takes less than a second. – Christian Sievers – 2018-12-12T11:37:56.260