Find the shortest pangrams from a word list

10

1

A pangram is a string that contains every letter a-z of the English alphabet, case-insensitive. (It's OK if the pangram contains more than one copy of a letter, or if it contains non-letter characters in addition to the letters.)

Write a program or function whose input is a list of strings, and which outputs one or more strings which have the following properties:

  • Each output string must be a pangram.
  • Each output string must be formed by concatenating one or more strings from the input list, separated by spaces.
  • Each output string must be the shortest, or tied for the shortest, among all strings with these properties.

Many programs will choose to output only one string; you'd only want to output more than one string if you'd otherwise have to write extra code to limit the output.

You may assume that the input contains no unprintable characters or spaces, and that no word in it is more than (26 times the natural logarithm of the length of the list) characters long. (You may not assume, however, that the input contains nothing but letters, or only lowercase letters; punctuation marks and uppercase letters are entirely possible.)

Input and output can be given in any reasonable format. For testing your program, I recommend using two test cases: a dictionary of English words (most computers have one), and the following case (for which a perfect (26-letter) pangram is impossible, so you'd have to find one containing duplicate letters):

abcdefghi
defghijkl
ijklmnop
lmnopqrs
opqrstuvw
rstuvwxyz

You should include a sample of your program's output along with your submission. (This may well be different for different people as a result of using different word lists.)

Victory condition

This is a challenge. The winner is the shortest program (in bytes) that runs in polynomial time. (A summary for people who don't know what that means: if you double the size of the word list, the program should become slower by no more than a constant factor. However, the constant factor in question can be as large as you like. For example, it's valid for it to become four times slower, or eight times slower, but not for it to become smaller by a factor of the length of the word list; the factor via which it becomes slower must be bounded.)

user62131

Posted 2016-12-06T19:12:23.367

Reputation:

When determining the complexity, can we use the fact that each word is at most 26 letters long? That the alphabet size is a constant of 26? – xnor – 2016-12-06T19:49:08.013

Yes. I put that restriction on the input there partly to make the complexity easier to define/calculate. – None – 2016-12-06T19:49:54.050

I think this runs into a technicality. If you ignore repeated input words, there are at most 27^26 possible input words, and so at most 2^(27^26) possible subsets of them as possible inputs. This is huge but a constant. So, any program on this finite set is constant-time, with the constant being the maximum number of steps taken over all possible inputs. – xnor – 2016-12-06T19:53:47.350

I didn't say there are no duplicated words in the input. I guess you could run the program in a "technical" O(n) time by filtering out punctuation marks and deduplicating the input first, though (or more likely O(n log n), which would use a lot less memory than a radix deduplicate would). Then you'd have to go back from the filtered version to the original word list. You can't claim the polynomial time in question unless you actually go through all those steps, though! – None – 2016-12-06T19:59:55.633

I'd forgotten about non-letters. Can we assume these are ASCII, or otherwise within some finite set? If so, I think any algorithm that starts by deduplicating can claim to be polynomial-time. – xnor – 2016-12-06T20:06:38.263

"You may assume that the input contains no unprintable characters or spaces", which implies a finite set. I guess I'll change the limitation on the length of the input to scale with the length of the list, in order to get rid of the technical argument. – None – 2016-12-06T20:08:29.100

Answers

3

Ruby 159 (iterative)

Ruby 227 220 229 227 221 (recursive)

New iterative solution (based on the algorithm described by @Niel):

c={('A'..'Z').to_a=>""}
while l=gets
d=c.clone
c.map{|k,v|j=k-l.upcase.chars
w=v+" "+l.strip
d[j]=w if !c[j]||c[j].size<w.size}
c=d
end
x=c[[]]
p x[1..-1] if x

Old recursive solution:

W=[]
while l=gets
W<<l.strip
end
I=W.join(" ")+"!!"
C={[]=>""}
def o(r)if C[r]
C[r]
else
b=I
W.map{|x|s=r-x.upcase.chars
if s!=r
c=x+" "+o(s)
b=c if c.size<b.size
end}
C[r]=b
end
end
r=o ('A'..'Z').to_a
p r[0..-2] if r!=I

The byte measurement is based on leaving off the final newline in the file, which does not matter to ruby 2.3.1p112. The byte count went back up after fixing a small bug (adding .downcase .upcase for case-insensitivity as required by the problem statement).

Here is an earlier version from before shortening identifiers and such:

#!/usr/bin/env ruby

$words = [];

while (line=gets)
  $words << line[0..-2];
end

$impossible = $words.join(" ")+"!!";

$cache = {};

def optimize(remaining)
  return $cache[remaining] if ($cache[remaining]);
  return "" if (remaining == []);

  best = $impossible;

  $words.each{|word|
    remaining2 = remaining - word.chars;
    if (remaining2 != remaining)
      curr = word + " " + optimize(remaining2);
      best = curr if (curr.length < best.length);
    end
  };

  $stderr.puts("optimize(#{remaining.inspect})=#{best.inspect}");

  return $cache[remaining] = best;
end

result = optimize(('a'..'z').to_a);

puts(result[0..-1]);

How does it work? It basically maintains a set of characters still to cover and only recurses on a word if it would reduce the uncovered set. Additionally, the results of recursion are memoized. Each subset of 2^26 corresponds to a memoization table entry. Each such entry is calculated in time proportional to the size of the input file. So the whole thing is O(N) (where N is the size of the input file), albeit with a huge constant.

DepressedDaniel

Posted 2016-12-06T19:12:23.367

Reputation: 311

1

JavaScript (ES6), 249 248 bytes, possibly competing

a=>a.map(w=>w.replace(/[a-z]/gi,c=>b|=1<<parseInt(c,36)-9,b=0,l=w.length)&&(m.get(b)||[])[0]<l||m.set(b,[l,w]),m=new Map)&&[...m].map(([b,[l,w]])=>m.forEach(([t,s],e)=>(m.get(e|=b)||[])[0]<=t+l||m.set(e,[t+l+1,s+' '+w])))&&(m.get(-2^-1<<27)||[])[1]

Explanation: Transforms the array by converting the letters into a bitmask, saving only the shortest word for each bitmask in a map. Then iterating over a copy of the map, augment the map by adding each combined bitmask if the resulting string would be shorter. Finally return the string saved for the bitmap corresponding to a pangram. (Returns undefined if no such string exists.)

Neil

Posted 2016-12-06T19:12:23.367

Reputation: 95 035

Interesting. Could you expatiate more on how it works and, if available, post the ungolfed code? – DepressedDaniel – 2016-12-08T02:09:00.790

1This should be a valid/competing entry. I think this actually runs in O(n log n), in fact! (The map has a hard limit of 2²⁶ entries, and thus doesn't show up in the complexity; thus the only time spent is the time reading the input.) – None – 2016-12-08T02:41:47.267

I just re-read the description and I get how it works now. Neat. +1 ... Hmm, when does it decide to stop trying to augment the map by considering pairs though? It should keep going until no relaxations are possible. – DepressedDaniel – 2016-12-08T03:03:57.913

@DepressedDaniel For each bitmask extracted from the original word list, it checks all the partial pangrams that it has found so far, and whether adding the word creates a pangram that's shorter than the one it currently knows for the combined bitmask. – Neil – 2016-12-08T13:00:06.603

@ais523 For large inputs (>1000 words), most of the time seems to be spent swapping. I tried switching from a Map to an Array and it got even slower! – Neil – 2016-12-08T20:47:41.923

-1

Python 3, 98, 94, 92 bytes

print([s for s in input().split()if sum([1 for c in range(65,91)if chr(c)in s.upper()])>25])

Iterates through the ASCII representation of the alphabet and adds a 1 to a list if the letter is found in the string. If the sum of the list is greater than 25 it contains all letters of the alphabet and will be printed.

Erich

Posted 2016-12-06T19:12:23.367

Reputation: 1

I think you can remove a space between (' ') and if. You can also change ord(i) in range(65,91) to 91>x>=65. Also, what's the complexity? – NoOneIsHere – 2016-12-08T20:15:52.710

1What is the complexity of this solution? It is *required* for the answer to be in polynomial complexity, otherwise it is non-competing. – NoOneIsHere – 2016-12-08T21:15:17.807

Sorry, I think it is O(n), because the input list can vary in length but – Erich – 2016-12-08T22:30:22.297

Sorry, I think it is O(n), because the input list can vary in length but the second loop always goes from 65 to 90. But I haven't tested it. – Erich – 2016-12-08T22:31:17.320

Not sure this satisfies "Each output string must be the shortest, or tied for the shortest, among all strings with these properties." – DepressedDaniel – 2016-12-08T22:49:10.427