Random subset generator

2

Randomly choose one out of k-length, ordered subset of characters in a string, while only storing a limited number of characters. The subset must be chosen with equal probability and may contain repeated characters. Do this without generating all possible permutations and assume k is at most the length of the string. For example, the string daddy has 7 subsets of length two: da, dd, dy, ad, ay, yd, ya. The function should return any one of them with the probability of 1/7.

ash

Posted 2017-03-11T19:58:54.547

Reputation: 37

Question was closed 2017-03-11T20:50:13.337

1You probably don't mean substring. te isn't a substring for example. – Maltysen – 2017-03-11T20:03:12.630

1It would be nice to have a bit of a stronger specification. An example is not really a satisfactory specification. – Post Rock Garf Hunter – 2017-03-11T20:03:40.053

3The usual term for non-contiguous substrings is "subsequence". But since you also seem to allow arbitrary orders, you probably want "subsets" or "sub-multisets", since you've included ee. – Martin Ender – 2017-03-11T20:06:53.937

updated to subsets – ash – 2017-03-13T01:46:12.153

1I think this question is asking, "choose 1 out of a set of k-lengthed, ordered subsets of the characters in a word, where the same character is equivalent (e.g. 'te' with the first e = 'te' with the second), with equal prabability" – MildlyMilquetoast – 2017-03-13T01:50:00.283

If I were to change this challenge, I would add more test cases, separate from the explanation of what to do (down below, probably). Also, I would remove the restriction on how the program/function finds an ordered subset, I don't see a clear reason of why it's there. Also, I might remove the "uniformly random" part to make it just "random", but that is up to you. You might want to be more clear on what exactly is the input and output for the program, and what it should do if k is greater than the string length, or if k <= 0. Look at some other questions (with decent scores) for guidance. – MildlyMilquetoast – 2017-03-13T02:06:55.503

Answers

1

Pyth - 5 bytes

While we're arguing about the nomenclature, I'm just guessing at the rules based on the example to just mean pick one from all unique permutations of length n of the string.

O{.PF

Try it online here.

Maltysen

Posted 2017-03-11T19:58:54.547

Reputation: 25 023

permutation works, but what if k is large? – ash – 2017-03-13T03:39:32.627

1@ash then it's really slow – Maltysen – 2017-03-13T03:46:25.880

0

Perl 6,  75  71 bytes

{set(flat $^a.comb.combinations($^b).map: *.permutations.map: *.join).pick}

Try it

{set($^a.comb.combinations($^b).map: |*.permutations.map: *.join).pick}

Try it

Expanded:

{
  set(                         # get the unique values from: (equal probability)

    $^a.comb.combinations($^b) # get the combinations
      .map: |*.permutations    # get the permutations of the combinations
        .map: *.join           # join each permutation into a string

  ).pick                       # pick one
}

The following could almost work except that it has a different probability if there are any repeats of characters.

{$^a.comb.pick($^b).join}

Brad Gilbert b2gills

Posted 2017-03-11T19:58:54.547

Reputation: 12 713