Enumerate rhyme schemes

26

3

A "rhyme scheme" is a string of letters a to z, such that the first occurrences of the characters are in ascending order (without gaps), starting from a. For example (with first occurrences marked):

abccdbebdcfa
^^^ ^ ^   ^

The number of rhyme schemes of length N is given by the Bell numbers B(N). (OEIS A000110)

The Challenge

Your task is to implement an enumeration of these rhyme schemes, i.e. a bijective mapping from integers to rhyme schemes. You're given a positive integer N <= 26, as well as a non-negative integer 0 <= i < B(N). Alternatively, you can use the range 1 <= i <= B(N). You should output a rhyme scheme of length N, such that every i yields a different string.

You may write a program or function, taking input via STDIN (or closest alternative), command-line argument or function argument and outputting the result via STDOUT (or closest alternative), function return value or function (out) parameter.

You may use either lower or upper case letters (consistently).

Your code must be able to handle any valid input in reasonable amount of time (e.g. not more than a few hours for N = 26, worst case i). This should allow solutions that scale exponentially with N (for small bases), even in slow languages but prohibit solutions that scale linearly with i (i.e. B(N)). In particular, that means you cannot just iterate through all valid rhyme schemes of length N until you've discard i schemes.

Standard rules apply.

Examples

The exact assignment of the i to schemes (i.e. the order of the schemes for a given N) is up to you. But say you chose lexicographical ordering, your solution should correspond to the following table (with - denoting invalid input):

N\i 1    2    3    4    5    6    7    8    9    10   11   12   13   14   15
1   a    -    -    -    -    -    -    -    -    -    -    -    -    -    -
2   aa   ab   -    -    -    -    -    -    -    -    -    -    -    -    -
3   aaa  aab  aba  abb  abc  -    -    -    -    -    -    -    -    -    -
4   aaaa aaab aaba aabb aabc abaa abab abac abba abbb abbc abca abcb abcc abcd

Here is a short CJam script which generates all valid rhyme schemes for any given length (but don't try more than 10 or you'll wait a while).

Related Challenges

Martin Ender

Posted 2016-02-03T18:42:12.867

Reputation: 184 808

5I might put a bounty on a (well-golfed) polynomial-time solution (in N), provided that doesn't turn out to be fairly trivial and I was just too stupid to find it. – Martin Ender – 2016-02-03T18:49:19.050

Although the bounty is up for polynomial time solutions, I'd still like to see exponential-time solutions that meet the time limit. (My own Mathematica reference implementation would currently still win the challenge.) – Martin Ender – 2016-02-08T13:06:44.823

B(26) is the smallest Bell number that doesn’t fit in a 64-bit integer. Meanie. :-( – Anders Kaseorg – 2016-02-10T03:24:20.867

Answers

3

CJam, 68 66 bytes

r~:W)1a*{__(;\);_,,.*.+}W(*r~{X@\=_2$\/:CX<!{X:C):X;}&C*-C'a+o}W*;

Try it online.

This is my first CJam program. It started life as a port of my Perl solution and was initially over 130 bytes long. Further golfing suggestions are welcome.

As with my Perl program, it is in two parts.

Part 1:
r~:W                                         | Read the first input (n) and store it in W
    )1a*                                     | Create an array of n+1 1s
        {              }W(*                  | Repeat n-1 times:
         __                                  | Duplicate array twice
           (;\);                             | Remove first element of 1st array. Swap
                                             | arrays. Remove last element of 2nd array
                _,,                          | Duplicate array. Count items. Create range
                   .*.+                      | Multiply arrays. Add 1st array to result

Part 2:
r~                                           | Read the second input (i)
   {                                  }W*    | Repeat n times:
    X@                                       | Push y (initially 1). Bring item 2 (last array) to top
     \=                                      | Swap top two items. Pop array[y] (v)
       _2$                                   | Duplicate v. Copy item 2 (i) to top
          \/:CX                              | Swap i & v. i/v. Store in C (c). Push y
               <!{       }&                  | If !(i/v < c):
                  X:C):X;                    | c = y. ++y (store in X)
                           C*-C'a+o          | i -= c * v. Push y. Push "a". Add c. Print
                                         ;   | Discard top item (integer 0)

To debug the arrays created by Part 1 add ]_`o~ between Parts 1 & 2. If n is 5, the arrays will look like this: [[1 1 1 1 1 1] [1 2 3 4 5] [2 5 10 17] [5 15 37] [15 52]]. The 0 indices of each array aren't used, they just make it easier by not having to calculate offsets. The arrays are calculated like this:

[2 5 10 17] [2 5 10 17] [2 5 10 17]        | Duplicate twice
[2 5 10 17] [2 5 10 17] [5 10 17]          | Discard first item of array
[2 5 10 17] [5 10 17] [2 5 10 17]          | Swap last two arrays
[2 5 10 17] [5 10 17] [2 5 10]             | Discard last item of array
[2 5 10 17] [5 10 17] [2 5 10] [2 5 10]    | Duplicate array
[2 5 10 17] [5 10 17] [2 5 10] 3           | Count items in array
[2 5 10 17] [5 10 17] [2 5 10] [0 1 2]     | Integer to range 0 - n-1
[2 5 10 17] [5 10 17] [0 5 20]             | Multiply arrays [2*0 5*1 10*2]
[2 5 10 17] [5 15 37]                      | Add arrays [5+0 10+5 17+20]

It keeps a copy of the old array while calculating the next one. The arrays are read and discarded in reverse order by Part 2.

CJ Dennis

Posted 2016-02-03T18:42:12.867

Reputation: 4 104

13

Python 2, 153

u=[1]*999;i=60;exec"u[i]=i%30*u[i-30]+u[i-29];i+=1;"*900
def x(l,n,a=0):m=u[30*l+a];c=n>=a*m;return'.'*l and chr(65+min(n/m,a))+x(l-1,[n%m,n-m*a][c],a+c)

It uses alphabetical order and 0-based indexing.

Let l denote the length of a suffix of the letters and a denote the number of distinct letters that were used in the preceding part. Then a function p(l,a) that calculates the number of ways to select the remaining letters could be 40 bytes:

p=lambda l,a:l<1or a*p(l-1,a)+p(l-1,a+1)

However, this is too slow for the challenge, so instead the necessary values are precalculated and stored in the u array. At each stage of the calculation, if the next letter is the one of the a already used, the n = k * p(l - 1, a) + n' where k is the 0-indexed letter of the alphabet and n' is the value of n for the next function call, which contains the information about the remaining letters. If a new letter is used, then n = a * p(l - 1, a) + n'.

feersum

Posted 2016-02-03T18:42:12.867

Reputation: 29 566

1How long does it take for worst case input? – Michael Klein – 2016-02-05T01:01:50.680

1@MichaelKlein A negligible amount of time. – feersum – 2016-02-05T01:37:49.780

This is exactly what I was planning to do (except I would have done it with JS). Nice job! +1 – ETHproductions – 2016-02-06T14:34:08.740

11

Haskell (GHC 7.10), 150 bytes

s=(1,\_->[]):s
k!((y,b):l@((x,a):_))|let h i|i<x=k:a i|(p,q)<-divMod(i-x)y=p:b q=(x+k*y,h):(k+1)!l
n#i=(['a'..]!!).fromEnum<$>snd(iterate(0!)s!!n!!0)i

The operator n # i computes the ith (zero-indexed) rhyme scheme of length n. It runs in O(n²) (big-integer) operations, taking advantage of Haskell’s lazy infinite lists for automatic memoization. Sample runs:

*Main> 26 # 0
"abcdefghijklmnopqrstuvwxyz"
*Main> 26 # 1
"abcdefghijklmnopqrstuvwxya"
*Main> 26 # 2
"abcdefghijklmnopqrstuvwxyb"
*Main> 26 # 49631246523618756271
"aaaaaaaaaaaaaaaaaaaaaaaabb"
*Main> 26 # 49631246523618756272
"aaaaaaaaaaaaaaaaaaaaaaaaab"
*Main> 26 # 49631246523618756273
"aaaaaaaaaaaaaaaaaaaaaaaaaa"
*Main> [1 # i | i <- [0..0]]
["a"]
*Main> [2 # i | i <- [0..1]]
["ab","aa"]
*Main> [3 # i | i <- [0..4]]
["abc","aba","abb","aab","aaa"]
*Main> [4 # i | i <- [0..14]]
["abcd","abca","abcb","abcc","abac","abaa","abab","abbc","abba","abbb","aabc","aaba","aabb","aaab","aaaa"]

(If the maximum N were 25 instead of 26, the .fromEnum could be removed, because B(25) fits in a 64-bit Int.)

Anders Kaseorg

Posted 2016-02-03T18:42:12.867

Reputation: 29 242

1Looks great. Would you mind adding a less golfed version for easier grokking? – Michael Klein – 2016-02-10T08:53:07.793

4

Perl 257 + 1 (-p flag) = 258

Perl 182 + 10 (-pMbignum flags) = 192

($n,$i)=split;@m=[@a=(1)x($n+1)];while($a[2]){push@m,[@a=map{$a[$_]*$_+$a[$_+1]}0..$#a-1]}$_='';$y=1;while($w=pop@m){$c=int($i/($v=$$w[$y]));$c=$y++if($c>=$y);$i-=$c*$v;$_.=chr$c+65}

Thanks to dev-nul for saving many bytes! I've now rewritten it based on what I learnt from doing the CJam version.

Calculates the rhyme in ascending alphabetical order, 0 indexed.

Two parts: Part 1 is 128 90 bytes and calculates a matrix for Part 2. Part 2 is 129 92 bytes and does some simple maths to calculate each letter. If I could get rid of the matrix and replace it with two simple numbers, I could calculate a single path through the matrix for each number and save a lot of bytes! Apparently, that idea doesn't work!

Unfortunately, it doesn't output the right rhymes for values of i higher than 9007199254740992, but it works beautifully for low values! I've added the Bignum library at a cost of 11 bytes. It's run from the command line with perl -pMbignum bell-rhyme.pl. -pMbignum  = 10 bytes. It's also very fast for any input value.

CJ Dennis

Posted 2016-02-03T18:42:12.867

Reputation: 4 104

2

Oracle SQL 11.2, 412 284 283 bytes

WITH a AS(SELECT CHR(96+LEVEL)d,LEVEL b FROM DUAL CONNECT BY LEVEL<=:i),v(s,c,n)AS(SELECT d,1,1 FROM a WHERE b=1 UNION ALL SELECT s||d,b,LENGTH(REGEXP_REPLACE(s||d,'([a-z])\1+','\1'))FROM v,a WHERE(b<=n OR b=c+1)AND LENGTH(s)<:n)SELECT s FROM v WHERE:n=LENGTH(s)AND:i<=:n ORDER BY 1;

Unfortunately it only runs up to a length of 8. Any greater value results in : ORA-01489: result of string concatenation is too long

Un-golfed

WITH a AS(SELECT CHR(96+LEVEL)d,LEVEL b FROM DUAL CONNECT BY LEVEL<=:i),
v(s,c,n) AS
(
  SELECT d,1,1 FROM a WHERE b=1
  UNION ALL
  SELECT s||d,b,LENGTH(REGEXP_REPLACE(s||d,'([a-z])\1+','\1')) 
  FROM v,a 
  WHERE (b<=n OR b=c+1) AND LENGTH(s)<:n
)
SELECT s FROM v WHERE LENGTH(s)=:n AND :i<=:n ORDER BY 1;

The a view generate the :i letters in column a and their value in b.

The recursive view v takes the string being constructed as parameter v, the value of the last letter used in c and the value of the greatest letter used in n. The n parameter is equal to the length of the string without any duplicate letter, thats what the regex is for.

A letter is valid if its value is <= the value of the greatest letter already used or it is the next letter to be used.

Somehow the query needs the LENGTH(s)<:n part to run, I must be missing something in how the query works.

The main SELECT takes care of filtering out the invalid inputs and the shorter strings built before the length targeted is reached.

412 bytes version

WITH a AS(SELECT * FROM(SELECT d,b,ROW_NUMBER()OVER(PARTITION BY b ORDER BY d)l FROM(SELECT CHR(64+DECODE(MOD(LEVEL,:i),0,:i,MOD(LEVEL,:i)))d,CEIL(LEVEL/:i)b FROM DUAL CONNECT BY LEVEL<=:i*:n))WHERE l<=b),v(s,c,p)AS(SELECT d,1,l FROM a WHERE b=1 UNION ALL SELECT s||d,c+1,l FROM v,a WHERE c+1=b AND(l<=LENGTH(REGEXP_REPLACE(s,'([A-Z])\1+','\1'))OR l=p+1))SELECT s FROM v WHERE LENGTH(s)=:n AND :i<=:n ORDER BY 1;

Do not try the 412 bytes query with 26. It puts the database in restricted mode, at least on my xe version running in a docker container on a macbook. I could try on the exadata at work, but sadly I still need to work for a living.

Jeto

Posted 2016-02-03T18:42:12.867

Reputation: 1 601

0

Mathematica, 136 bytes

(For[j=2^#-1;t=#2,c=1;m=0;x=t;r=If[#>0,++m,c*=m;d=x~Mod~m+1;x=⌊x/m⌋;d]&/@j~IntegerDigits~2;;c<=t,t-=c;--j];FromCharacterCode[r+64])&

For completeness, here is my golfed reference implementation. As opposed to the existing answers this does not run in polynomial time (it's exponential in N with base 2) but does meet the time constraint (the worst case would still run in under half an hour).

The idea is this:

  • For each rhyme scheme we can identify the positions where the maximum character so far increases:

    ABCDEFGHDIJDEKBBIJEIKHDFII
    ^^^^^^^^ ^^  ^
    

    We can treat those markings as a binary number which makes it easy to iterate over all such structures. We need to iterate from 2n-1 to 2n (or the other way round) which is where the exponential time complexity comes from.

  • For each such structure it's easy to determine how many such strings there are: only the gaps between the markings can be freely chosen, and the maximum in front of the gap tells us how many different characters are valid in each position. This is a simple product. If this number is smaller than i, we subtract it from i. Otherwise, we've found the structure of the requested rhyme scheme.
  • To enumerate the schemes in the given structure, we simply represent i (or what remains of it) as a mixed-base number, where the weights of the digits are determined by the number of allowed characters in the remaining positions.

I wonder if this would allow for a shorter solution in some of the other languages that were submitted since it doesn't require any memoisation or precomputation.

Martin Ender

Posted 2016-02-03T18:42:12.867

Reputation: 184 808