Quickly find length of n-th term of the look-and-say sequence

7

5

The look and say sequence is an example of a run length encoding sequence. To get the next term of the sequence one groups the sequence into runs of the same number, each group in the next term then becomes two digits, first the number of terms in the group, followed by the value in the group. The first term is 1.

Here is how we would calculate the first couple of terms by hand:

We start with 1, grouping that gives us [1] a single group of 1 1. Thus our next term is 11. We group this and get [11] a single group of 2 1s, so our next term is 21. Grouping again we get [2],[1], we translate each of these groups into 2 digits each, 12,11 and concatenate, 1211. Group again [1],[2],[11], then convert each group, 11,12,21 and concatenate, 111221.

The first couple terms of the sequence are:

1
11
21
1211
111221
312211
13112221
1113213211

Write a program that when given n as input returns the length of the nth term of this sequence as output. Programs should have a runtime of O(n log n).

Answers will be scored in bytes with less bytes being better.

Soham Chowdhury

Posted 2012-09-25T08:40:16.600

Reputation: 1 449

This is not like this.

– Soham Chowdhury – 2012-09-25T08:40:37.847

2I don't understand. – flodel – 2012-09-25T09:49:01.330

@flodel "One 1. Two 1(s). One 2, one 1. One 1, one 2, two 1(s)..." – DavidC – 2012-09-25T15:33:22.103

2@DavidCarraher But what does Numbers allowed mean? – Gareth – 2012-09-25T15:41:20.297

1Yes, I know what the look and say sequence is. I don't understand without calculating the terms (or am puzzled how it would be possible without it) and numbers allowed. – flodel – 2012-09-25T15:53:45.227

The sequence starting from 1 splits into the Conway elements Hf (11132) and Sn (13211) after seven iterations. From than point onwards, it should be possible to calculate the length of the sequence by iterating the element decay matrix. The biggest difficulty in golfing this is going to be compressing the matrix. – Ilmari Karonen – 2012-09-25T16:04:07.567

@flodel I'm guessing that the poser meant that you can make numerical computations but cannot generate the LookAndSay output itself (nor use it internally). As for myself, I don't think it can be done. But I could easily be wrong. – DavidC – 2012-09-25T16:44:59.187

"Numbers allowed" is presumably because the "this" of "not like this" is a problem which explicitly disallows the use of any numerical character in the program. – Peter Taylor – 2012-09-25T21:43:46.427

1http://oeis.org/A005341 mentions an order-71 recurrence. It's linear and its characteristic polynomial is given at http://en.wikipedia.org/wiki/Look-and-say_sequence , so there is a closed form in terms of 71 constants. – Peter Taylor – 2012-09-25T22:04:19.267

To avoid ambiguity about what "without calculating the terms" means, may I suggest requiring that valid entries must run in at most O(n log n) time with respect to the input n? This should rule out any naive brute-force solution (which need exponential time and space), whereas I believe both r.e.s.'s solution and mine should pass. – Ilmari Karonen – 2012-09-26T17:39:31.640

I agree, will do. – Soham Chowdhury – 2012-09-26T17:43:17.780

Answers

11

GolfScript, 300 chars

~8-:m;'!5?HZi " A 9 "$??HLTZ^`eik 5Z`e <^ $k H ,Sds ?LT /5@MUav I " i ? "5i G  !#+.4;>@K]_cdhjy{| *HCRJD@:64122/42.,,*860-40-0.+9?B<862//?D<7=@:420-/5:60.-20.+./384/162.-HCRJD@:64122/42.,,*)! !""$&&('{32-}/]1,/~:&;:L;:T;]:S;0m>{&m=}{L,,{.49=\71=+}%:V{,(,{).V=\T?S={(V=+}/}%0+:V}m*,,0\{.V=\L=8-*+}/}if

Here's a slightly shorter entry. Reads a number m from stdin, writes the length of the m-th term in the look-and-say sequence to stdout.

For example, input 50 produces output 894810, while input 1000 produces (after a few seconds):

21465050086246039983937316427688400486337379416867195943208935654503780844828013651860410256502890125611856727316762

I'll post a more detailed explanation of the code later, but basically this also works by repeatedly multiplying a vector with a sparse matrix, encoded as an ASCII string. Some clever compression tricks, such as leaving out the all-ones superdiagonal of the matrix, are employed.

Ps. Here's a link to an online demo. The server has been pretty slow lately, though, so if you see an error message about a timeout, try again (or just download the interpreter and run it yourself).


Addendum: Here's the promised detailed explanation. First, let me repost the code with whitespace and comments added:

# Parse input, subtract 8, save in variable m:
~ 8- :m;

# This long string encodes all the constants used by the program (I'll describe it below):
'!5?HZi " A 9 "$??HLTZ^`eik 5Z`e <^ $k H ,Sds ?LT /5@MUav I " i ? "5i G  !#+.4;>@K]_cdhjy{| *HCRJD@:64122/42.,,*860-40-0.+9?B<862//?D<7=@:420-/5:60.-20.+./384/162.-HCRJD@:64122/42.,,*)! !""$&&('

# Parse the data string:
# The data consists of several arrays of positive integers, encoded as printable ASCII
# characters and separated by spaces.  (Note that this means that the range of possible
# data values starts from 1, which requires some adjustment below.  Also, the values in
# L are actually shifted up by 8 to avoid having single quotes in the data string.)

{32-}/   # Subtract 32 (ASCII code of space) from each char and dump result on stack.
]        # Collect numbers from stack into an array.
1,/ ~    # Split array on zeros (formerly spaces) and dump pieces on stack.
:&;      # Save topmost piece (canned responses for inputs 1-7) in var &.
:L;      # Save second piece (element lengths) in var L.
:T;      # Save third piece (nonstandard decay targets) in var T.
]:S;     # Collect remaining pieces (nonstd decay sources) in array and save it in S.

# If m < 0, return a precomputed response from &, else run the indented code below:
0 m > { & m = } {
  # Set V to be the initial element abundance vector (1 Hf + 1 Sn):
  # (Note: both L and V have an extra sentinel 0 at the end to avoid overruns below.) 
  L,, { . 49= \ 71= + } % :V
  # Update the element abundances m times:
  {
    # Run the following block for all numbers i from 0 to length(V)-2,
    # collecting the results in an array:
    ,(, {
      # Start with the abundance of element i+1 in V (all elements except H decay
      # downwards; the extra 0 at the end of V is needed to avoid a crash here):
      ).V=\
      # If this element is one of the decay targets listed in T, find the corresponding
      # list of source elements in S and add their abundances from V to the total:
      # (If this element is not in T, we just loop over an empty array at the end of S.)
      T? S= { ( V= + } /
    } %
    # Append a new sentinel 0 to the end of the array and save it as the new V:
    0+:V
  } m *
  # Multiply the element abundances from V with their lengths from L and sum them:
  ,,0\{.V=\L=8-*+}/
} if

The key fact that allows this code to work is that, as the look-and-say transformation is repeatedly applied to any string, it eventually splits into substrings which evolve independently, in the sense that we can obtain all the following elements of the look-and-say sequence starting from that string by taking each of the substrings, applying the look-and-say transformation to each of them independently n times and concatenating the results.

Furthermore, in his 1986 article "The Weird and Wonderful Chemistry of Audioactive Decay" (which basically introduced the look-and-say sequence to the world), John Conway proved a remarkable (and, for this challenge, crucial) result which he called the "cosmological theorem":

Cosmological theorem (Conway): There is a set of 92 finite strings of numbers 1–3 (which Conway named after the chemical elements; see table here), and two infinite families of strings each consisting of a fixed prefix followed by some number higher than 3 (these can only appear if the initial string contained number higher than 3 or sequences of more than 3 identical numbers), such that, after at most 24 applications of the look-and-say transformation, any string splits into a chain of such elements, each of which thereafter evolves independently (and splits into further elements).

Thus, to determine the length of a string sufficiently far in the look-and-say sequence, it's sufficient to know how many times each element appears in it. And, since each element evolves independently, knowing these "elemental abundances" is also sufficient for calculating the corresponding abundances of the next string in the sequence, and so on.

As it happens, the look-and-say sequence starting from the string 1 splits into two Conway elements (Hf72 = 11132 and Sn50 = 13211) after just seven iterations. Thus, thereafter, we only need to keep track of the abundances of the 92 "common" elements (as the two families of "transuranic" elements cannot arise from common elements).

Mathematically, we can treat the elemental abundances as a vector of length 92. Calculating the abundances in the next step of the look-and-say sequence is then just equivalent to multiplying this vector with a 92 × 92 matrix, which is what r.e.s.'s solution does literally.

In fact, r.e.s.'s solution actually tracks the abundance of each element multiplied with its length (i.e. the "total mass" of that element in the string), adjusting the matrix accordingly, so that the total length of the string is given simply by adding the numbers in the abundance vector together. However, I did not find that a particularly convenient representation for code golf. (It was convenient for Nathaniel Johnston's derivation of Conway's polynomial, since it encodes the lengths into the matrix and thus gives the right eigenvalue for the asymptotic growth rate, but we don't care about that here.)

Instead, I just track the raw abundances of the elements, and only multiply them with the lengths at the end. This has, combined with the convenient fact that none of the decay products of any element contain more than one copy of any element, means that the entries of the transition matrix are just zeros and ones. Also, by far most of them are, in fact, zeros (in math jargon, the matrix is sparse), so I can save a lot of space by only storing the indices of those entries which are ones.

Also, in his article, Conway ordered the elements such that (except for H1 = 22, which is a fixed point of the look-and-say transformation), each element decays to the next lower element in the sequence (and possibly some others). In fact, most elements decay only to the next lower elements. Thus, by using Conway's ordering of the elements, I can hardcode this "standard decay" as part of the iteration, so that I only need to store the matrix entries corresponding to the other, "nonstandard" decays, saving even more space.

(In fact, what I actually do, for reasons mostly to do with how GolfScript's array operations work, is store one array listing the matrix rows containing non-zero entries (i.e. the elements which are produced in nonstandard decays) and another array of arrays listing, for each of these rows, the columns where the non-zero entries occur (i.e. the elements which produce these elements in such decays). See the code and comments above for how these arrays are actually used to calculate the updated element abundances.)

The final space-saving trick I used was to encode all these arrays as strings of ASCII characters. Conveniently, there are 95 printable ASCII characters from space (32) to ~ (126), and the largest value that appears in any of the arrays is 92, so they fit very well. One extra tweak I made was to shift the values in the element length array (which only go up to 42 anyway) up by 8 so that I could avoid the character ' (39) and thus store my data in a single quoted string without having to add any backslash escape characters.

There are some other minor optimizations too, but those are just standard GolfScript coding tricks. I should note that the actual code is not all that carefully optimized — it's quite possible that it could be golfed down further.

In any case, despite the inherent slowness of a double-interpreted language like GolfScript, the code runs quite fast and scales well: computing the length of the 10,000th string in the look-and-say sequence takes about 1.5 minutes on my computer.

Conveniently, GolfScript stores all numbers as arbitrary-length integers (bignums), so I didn't have to do anything special to make it produce exact results even for large inputs. The expected time complexity of this program is O(n log n) (as the element abundances grow linearly with n, and bignum addition takes O(log n) time), and the actual runtime seems to fit that prediction pretty well, as the graph below shows:

Graph of execution time divided by input value, showing logarithmic growth.

(Time is in seconds; for what it's worth, the green curve is given by y = (a x log(b + x) + c) / x, where a = 0.00106462026133545, b = 2720.25843721031 and c = -0.0401630789530016. The spike on the left mainly comes from the roughly constant runtimes for n < 8, which become relatively large when divided by small n. I did not include it in the curve fit.)

Ilmari Karonen

Posted 2012-09-25T08:40:16.600

Reputation: 19 513

Nice. Did you use the same logic as @r.e.s.? – Soham Chowdhury – 2012-09-26T17:52:14.100

The details are a bit different, but the basic idea of len = L * M^(n-8) * V is the same. – Ilmari Karonen – 2012-09-26T18:00:02.330

+1 for the excellent, brilliant explanation. I wish I could mark both of you right. – Soham Chowdhury – 2012-09-27T01:54:46.797

Couldn't you make it smaller by removing the canned responses for n = 1 to n = 7? – Soham Chowdhury – 2012-09-27T01:57:36.657

1@Soham: I could, but I sort of assumed that I was required to provide the correct answer for any input, not just for inputs larger than 7. – Ilmari Karonen – 2012-09-27T11:20:07.527

2

Sage (18664 2021 1977)

This is a version that uses Nathaniel Johhnston's matrix formulation. (The matrix could probably be golfed further using eval.)

def f(n):
 z=[0];a=z*10;b=z*15;c=z*18;d=z*25;e=z*31;g=z*68;T=Matrix([z*28+[2]+z*63,e+[7/23]+z*60,z*29+[4/3]+z*62,z*30+[4/3]+z*61,z*32+[2]+z*59,z*53+[1/2]+z*38,z*36+[3/2]+z*55,z*37+[2]+z*54,z*38+[8/5]+z*53,z*39+[5/3]+z*52,z*43+[5/3]+z*48,z*45+[7/4]+z*46,z*46+[12/7]+z*45,z*47+[7/4]+z*44,z*48+[3/2]+z*43,z*50+[21/17]+z*41,z*51+[21/17]+z*40,z*49+[13/10]+z*42,z*44+[7/5]+z*47,z*52+[7/5]+z*39,z*40+[7/5]+z*51,z*41+[4/3]+z*50,z*42+[4/3]+z*49,z*34+[5/32,5/32]+z*56,z*56+[7/11,7/13,1/3,7/17]+z*32,z*54+[10/7]+z*37,z*55+[10/7]+z*36,z*33+[4/3]+z*58,b+[1/21,1/21,0,1/7]+z*12+[2/23,0,0,1/16,1/16]+z*17+[1/5,0,0,2/11,2/13,2/21,4/17]+z*21+[1/8]+z*3+[1/3]+z*6,z*17+[9/26]+g+[9/10]+z*5,z*87+[9/10]+z*4,z*19+[23/28]+z*72,z*34+[1/16,1/16]+d+[2]+z*19+[1/8]+a,z*88+[2]+z*3,z*90+[32/27,0],z*89+[32/27,0,0],z*91+[8/5],z*66+[3/7]+z*7+[1/3]+a+[1/2,3/10,3/10]+z*4,z*66+[5/7]+d,z*62+[3/2]+z*29,z*63+[10/7]+z*28,z*64+[9/7]+z*27,z*65+[9/7]+z*26,z*67+[3/2]+z*24,z*82+[5/3]+z*9,z*83+[8/5]+z*8,z*74+[7/9,7/12,7/12,7/16,7/18,7/24,7/23,7/16]+a,g+[4/3]+z*23,z*70+[6/5]+z*21,z*71+[5/4]+z*20,z*72+[17/14]+z*19,z*73+[17/14]+c,z*84+[4/3]+z*7,z*69+[5/4]+z*22,z*6+[7/12]+g+[7/12]+z*16,z*76+[7/12]+b,z*77+[11/16]+z*14,z*78+[13/18]+z*13,z*79+[7/8]+z*12,z*80+[17/23]+z*11,e+[2/23,0,0,1/16,1/16]+z*17+[1/5]+z*5+[2/17,1]+z*20+[1/8]+a,[0,1/7]+z*90,[1]+z*91,[0,1]+z*90,[0,0,7/6]+z*89,z*3+[7/6]+z*88,z*57+[7/13]+z*34,z*4+[1]+z*54+[4/17]+z*32,z*5+[6/5]+z*86,z*7+[4/3]+z*84,z*8+[5/4]+z*83,z*20+[8/7]+z*71,z*21+[7/6]+z*70,z*22+[7/6]+z*69,c+[9/14,9/28]+z*72,z*9+[6/5]+z*82,a+[6/5]+z*81,z*12+[4/3]+z*79,z*13+[9/7]+z*78,z*14+[4/3]+z*77,b+[23/42,23/42,23/26]+z*74,z*11+[8/7]+z*80,z*23+[6/5]+g,z*6+[5/12]+z*85,e+[15/23]+z*26+[5/7]+z*33,z*24+[6/7]+z*67,d+[1]+z*66,z*26+[1]+z*65,z*27+[3/8]+e+[3/17]+d+[1/2]+z*6,z*16+[9/14]+c+[27/32]+z*56,b+[9/14]+c+[27/32]+z*57,c+[5/14]+z*8+[5/8]+d+[1/2,0,0,5/11]+z*24+[5/16]+a]);v=vector(z*23+[5]+z*14+[5]+z*53);L=[1,2,2,4,6,6,8,10]
 if n<9:return L[n-1]
 Q=T
 for i in[1..n-9]:Q=Q*T
 return sum(Q*v)

E.g., [f(n) for n in [1..50]] produces

[1, 2, 2, 4, 6, 6, 8, 10, 14, 20, 26, 34, 46, 62, 78, 102, 134, 176, 226, 302, 408, 528, 678, 904, 1182, 1540, 2012, 2606, 3410, 4462, 5808, 7586, 9898, 12884, 16774, 21890, 28528, 37158, 48410, 63138, 82350, 107312, 139984, 182376, 237746, 310036, 403966, 526646, 686646, 894810]

r.e.s.

Posted 2012-09-25T08:40:16.600

Reputation: 2 872

1Wow, that's probably the longest answer I've ever seen to a code golf challenge. :) – Ilmari Karonen – 2012-09-26T16:10:57.960

I agree. I'll test it out. – Soham Chowdhury – 2012-09-26T16:36:23.787

I thought at first that this was Golf. Python, hm.. nice answer. This matches the requirements. – Soham Chowdhury – 2012-09-26T16:46:39.790

Nice. And oh, this isn't Python? – Soham Chowdhury – 2012-09-26T17:13:02.650

@SohamChowdhury -- Sage has a "Python- based interface".

– r.e.s. – 2012-09-26T17:15:11.590

0

Mathematica 113

The following violates the constraint, "without calculating the terms". It also is based directly on an example from OEIS.org (search "look and say") which uses code almost identical to that at Mathworld.

So it can be considered hors concours (community wiki).

r@j_ := Through[{First, Length}[#]] & /@ Split[j];
n_~s~p_ := NestList[Flatten[Reverse /@ r@#] &, {p}, n - 1];
Length@s[k, 1][[k]]

I am including it here because it may help someone figure out how to solve the challenge without calculating the terms. My intuition tells me it cannot be solved without calculating the terms.


Usage

Find the length of the 20th term.

k=10;
Length@s[k, 1][[k]]

302

DavidC

Posted 2012-09-25T08:40:16.600

Reputation: 24 524

0

R - 57

s=1;for(i in 2:k){z=rle(s);s=c(rbind(z$l,z$v))};length(s)

Same as @DavidCarraher, I am violating the without calculating the terms requirement.

Example usage:

k = 20
s=1;for(i in 2:k){z=rle(s);s=c(rbind(z$l,z$v))};length(s)
# [1] 302

I'm not sure this is acceptable (please comment), so in case I need to write a function, my solution would be:

L=function(s,k){for(i in 2:k){z=rle(s);s=c(rbind(z$l,z$v))};length(s)}

which is a little bit longer (70 chars) but also allows for the starting value to be provided. The usage:

L(s=1, k=20)
# [1] 302

flodel

Posted 2012-09-25T08:40:16.600

Reputation: 2 345

2No, this does not meet the requirements. :( – Soham Chowdhury – 2012-09-26T17:47:27.993