Morse Decode Golf

24

3

I've become alarmed with the growing hatred of spaces and this answer has inspired me to make sure Morse code is safe from this insidious removal of whitespace.

So, your task will be to create a program that can successfully translate Morse code with all of the spaces removed.

Morse code

Rules:

  1. Input will be a string consisting only of dashes and dots (ASCII 2D and 2E). Output is undefined for input containing any other characters. Feel free to use any method convenient to your language of choice to receive the input (stdin, text file, prompt user, whatever). You can assume that the Morse code input only consists of the letters A-Z, and matching numbers or punctuation is not required.

  2. Output should include only words contained in this dictionary file (again, feel free to use any convenient method to access the dictionary file). All valid decodings should be output to stdout, and all dots and dashes in the input must be used. Each matched word in the output should be separated by a space, and each possible decoding should separated by a newline. You can use upper case, lower case, or mixed case output as convenient.

  3. All restrictions on standard loopholes apply with one exception as noted above, you may access the dictionary file referenced in requirement 2 via an internet connection if you really want to. URL shortening is acceptable, I believe that goo.gl/46I35Z is likely the shortest.

  4. This is code golf, shortest code wins.

Note: Posting the dictionary file on Pastebin changed all of the line endings to Windows style 0A 0E sequences. Your program can assume line endings with 0A only, 0E only or 0A 0E.

Test Cases:

Input:

......-...-..---.-----.-..-..-..

Output must contain:

hello world

Input:

.--..-.-----..-..-----..-.--..--...---..--...-.......--.-..-.-.----...--.---.-....-.

Output must contain:

programming puzzles and code golf

Input:

-.....--.-..-..-.-.-.--....-.---.---...-.----..-.---..---.--....---...-..-.-......-...---..-.---..-----.

Output must contain:

the quick brown fox jumps over the lazy dog

Comintern

Posted 2014-08-24T17:02:38.540

Reputation: 3 632

@seequ eg is not in the list, but ate and we are. ;) eisner a go mae a ere sounds nice. – Titus – 2017-02-04T09:39:43.333

What do You want with all the irish family names in the word list? What are we supposed to do with the '? My code will take it as end of word. – Titus – 2017-02-04T10:57:51.510

3How can you tell between AN (.- -.) and EG (. --.) ? – seequ – 2014-08-24T18:06:53.610

@Sieg You look up possible combinations in the dictionary – Beta Decay – 2014-08-24T18:09:30.947

@Sieg - That is the crux of the challenge. EG isn't in the word file - AN is. If there are multiple valid decodings, you output them all. Although I would be extremely impressed if somebody checks for valid grammar as well. – Comintern – 2014-08-24T18:09:52.950

@Comintern I bet there are words where similar problem arises. – seequ – 2014-08-24T19:25:32.277

2@Sieg - The output would then need to include both valid decodings. – Comintern – 2014-08-24T19:31:07.773

Can I strip the carriage returns from the dictionary> – Dennis – 2014-08-24T19:34:07.123

@Dennis - To be fair, I'd have to say not before inputing it. Otherwise, I can envision several ways in which preprocessing the file could give an undo advantage. – Comintern – 2014-08-24T19:50:16.440

I could too. That's why I asked specifically about the line endings. Oh well, I'll deal with the pesky Windows format... :P – Dennis – 2014-08-24T19:52:03.563

1@Dennis - Ahhh... I'll bet that either Pastebin or my browser did that. My source file doesn't have them. You can change the line delimiter to a system appropriate one, no other changes. I'll edit into the question when I'm not on my phone. – Comintern – 2014-08-24T19:58:17.217

1

I'm afraid this challenge does not work as expected: For the "hello world" morse code example my Python prototype yields hundreds or maybe thousands of possible translations: http://pastebin.com/baveM1FH, all of which seem to be valid. I mean, technically that might be a valid solution...

– Falko – 2014-08-24T20:47:23.773

2@Falko that's correct behavior. Note that the problem says your output must include "hello world", not that it's limited to that. It should print all valid decodings. – hobbs – 2014-08-25T02:55:36.543

@Falko By my count, the "hello world" example has 403,856 lines of output, eventually degenerating to stuff like "i i it i et it met to et it it i" – hobbs – 2014-08-25T13:31:49.017

2(almost poetic, really) – hobbs – 2014-08-25T13:32:32.880

1For the first input, there are 403856 answers. For the second input, there are 2889424682038128 valid answers. For the third output, there are 4986181473975221635 answers... – Ray – 2014-08-25T22:13:54.163

Answers

5

Ruby, 210

(1..(g=gets).size).map{|x|puts IO.read(?d).split.repeated_permutation(x).select{|p|p.join.gsub(/./,Hash[(?a..?z).zip"(;=/%513':07*)29@-+&,4.<>?".bytes.map{|b|('%b'%(b-35))[1,7].tr'01','.-'}])==g}.map{|r|r*' '}}

If there exists such a practice as "over-golfing", I suspect I have partaken this time around. This solution generates an array of arrays of repeated permutations of all of the dictionary words, from length 1 up to the length of the input. Given that "a" is the shortest word in the dictionary file and its code is two characters long, it would have been sufficient to generate permutations of length up to half the size of the input, but adding /2 is tantamount to verbosity in this domain, so I refrained.

Once the permutation array has been generated (NB: it is of length 45404104 in the case of the pangrammatic example input), each permutation array is concatenated, and its alphabetic characters are replaced with their Morse code equivalents via the rather convenient (Regexp, Hash) variant of the #gsub method; we've found a valid decoding if this string is equal to the input.

The dictionary is read (several times) from a file named "d", and the input must not contain a newline.

Example run (with a dictionary that'll give the program a fighting chance at ending before the heat death of the universe):

$ cat d
puzzles
and
code
dummy
golf
programming
$ echo -n .--..-.-----..-..-----..-.--..--...---..--...-.......--.-..-.-.----...--.---.-....-. | ruby morse.rb
programming puzzles and code golf
^C

Three If By Whiskey

Posted 2014-08-24T17:02:38.540

Reputation: 468

5

Haskell, 296 characters

  • Dictionary file: must be a text file named "d"
  • Input: stdin, may have a trailing newline but no internal whitespace
main=do f<-readFile"d";getLine>>=mapM(putStrLn.unwords).(words f&)
i!""=[i]
(i:j)!(p:q)|i==p=j!q
_!_=[]
_&""=[[]]
d&i=do
w<-d
j<-i!(w>>=((replicate 97"X"++words".- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --..")!!).fromEnum)
n<-d&j
[w:n]

Explanation of elements:

  • main reads the dictionary, reads stdin, executes &, and formats the output of & with appropriate whitespace.
  • (replicate 97"X"++words".- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --..")!!) (an expression inside of the definition of &) is a list whose indices are character codes (97 is the code of 'a') and values are Morse sequences.
  • ! (a function named as an infix operator) matches a string against a prefix; if the prefix is present it returns the remainder in a one-element list (success in the list monad), otherwise the empty list (failure in the list monad)
  • & uses the list monad for “nondeterministic” execution; it

    1. picks an entry of d (a dictionary word),
    2. uses ! to match the Morse form of that word (w>>=((…)!!).fromEnum, which is equivalent to concatMap (((…)!!) . fromEnum) w) against the input string i,
    3. calls itself (d&j) to match the rest of the string, and
    4. returns the possible result as a list of words w:n, in the list monad [w:n] (which is the shorter, concrete equivalent to return (w:n)).

    Note that every line after line 6 is part of the do expression started on line 6; this takes exactly the same number of characters as using semicolons on a single line, but is more readable, though you can only do it once in a program.

This program is extremely slow. It can be made faster (and slightly longer) easily by storing the morsified words next to the originals in a list rather than recomputing them at each pattern match. The next thing to do would be to store the words in a binary tree keyed by Morse symbols (a 2-ary trie) so as to avoid trying unnecessary branches.

It could be made slightly shorter if the dictionary file did not contain unused symbols such as "-", allowing removal of replicate 97"X"++ in favor of doing .(-97+) before the !!.

Kevin Reid

Posted 2014-08-24T17:02:38.540

Reputation: 1 693

Damn son, that's clever. +1 to you. – Alexander Craggs – 2014-08-24T21:10:09.187

1Did you know that (+(-97)) can be rewritten as (-97+)? – proud haskeller – 2014-08-24T22:02:13.357

You should remove the third definition of h and instead add |0<1=[] to the second definition – proud haskeller – 2014-08-24T22:07:19.737

But if it was me, i would try to define h using isprefixof. I think this is worth a try. – proud haskeller – 2014-08-24T22:15:14.710

@proudhaskeller |0<1=[] doesn't handle the case where i is empty and p isn't. isPrefixOf requires an import and doesn't give us the remainder of the string to continue matching. – Kevin Reid – 2014-08-24T22:19:22.203

How about import List;h i j|isPrefixOf j i=[drop(lengh j)i]|0<1=[]. Almost shorter. – proud haskeller – 2014-08-24T22:28:37.440

@KevinReid I think the next thing you should do is make h, g infix. – proud haskeller – 2014-08-24T22:29:29.643

@proudhaskeller Done, thanks. (I knew about that technique but for some reason thought it wouldn't help here; I was WRONG.) – Kevin Reid – 2014-08-25T00:19:43.067

2Use interact and win 12 characters. interact$unlines.map unwords.(words f&) – gxtaillon – 2014-08-25T17:20:07.347

1You should be able to replace concatMap with >>= – proud haskeller – 2014-08-25T17:58:53.273

You should be able to replace concatMap with >>=, or maybe =<< if it's precedence saves a few characters. – proud haskeller – 2014-08-25T18:01:41.337

3

Python - 363 345

Code:

D,P='-.';U,N='-.-','.-.'
def s(b,i):
 if i=='':print b
 for w in open('d').read().split():
  C=''.join([dict(zip('abcdefghijklmnopqrstuvwxyz-\'23',[P+D,D+3*P,U+P,'-..',P,D+N,'--.',4*P,2*P,P+3*D,U,N+P,2*D,D+P,D*3,'.--.',D+U,N,P*3,D,'..-',3*P+D,'.--','-..-',U+D,'--..']+['']*4))[c]for c in w]);L=len(C)
  if i[:L]==C:s(b+' '+w,i[L:])
s('',input())

Explanation:

The dictionary must be stored as a plain text file named "d".

D, P, U and N are just some helper variables for a shorter definition of the morse lookup table.

s(i) is a recursive function that prints the previously translated message part p and each valid translation of the remaining code part i: If i is empty, we reached the end of the code an b contains the whole translation, thus we simply print it. Otherwise we check each word w in the dictionary d, translate it into morse code C and, if the remaining code i starts with C, we add the word w to the translated beginning b and call the function s recursively on the remainder.

Note on efficiency:

This is a pretty slow but golfed version. Especially loading the dictionary and constructing the morse lookup table (dict(zip(...))) in each iteration (to avoid more variables) costs a lot. And it would be more efficient to translate all words in the dictionary file once in advance and not in each recursion on demand. These ideas lead to the following version with 40 more characters but significant speed-up:

d=open('d').read().split()
D,P='-.';U,N='-.-','.-.'
M=dict(zip('abcdefghijklmnopqrstuvwxyz-\'23',[P+D,D+3*P,U+P,'-..',P,D+N,'--.',4*P,2*P,P+3*D,U,N+P,2*D,D+P,D*3,'.--.',D+U,N,P*3,D,'..-',3*P+D,'.--','-..-',U+D,'--..']+['']*4))
T=[''.join([M[c]for c in w])for w in d]
def s(b,i):
 if i=='':print b
 for j,w in enumerate(d):
  C=T[j];L=len(C)
  if i[:L]==C:s(b+' '+w,i[L:])
s('',input())

Falko

Posted 2014-08-24T17:02:38.540

Reputation: 5 307

You can save 2 characters by replacing .startswith(C) with [:len(C)]==C. – Greg Hewgill – 2014-08-24T21:36:58.913

Wow, thanks! It's getting pretty weird, since loading the whole dictionary in each recursion saves characters - and slows down the algorithm once more. – Falko – 2014-08-24T21:48:17.143

@GregHewgill: Yeah, that's what I did originally. I just edited my answer to address both versions. – Falko – 2014-08-24T21:56:13.477

1You can produce more interesting results (longer words) more quickly by sorting the dictionary by descending length of words. d=sorted(open('d').read().split(),key=len,reverse=1) Or, do that externally by pre-sorting your dictionary that way. – Greg Hewgill – 2014-08-24T21:56:15.630

Heck, if you can reformat the dictionary file, format it as a precalculated Python dictionary and M=eval(open('d').read()) :) – Greg Hewgill – 2014-08-24T22:02:17.387

>

  • Good idea. But sorting does not help w.r.t. the challenge, since each solution needs to get printed anyway. 2) As far as I understand the rules this is not allowed. (Then I'd include the translation into morse code as well! ;) )
  • < – Falko – 2014-08-24T22:06:01.720

    3

    Perl (5.10+), 293 characters

    Dictionary file should be saved as "d" (and should be in unix format if you don't want CRs between words), morse input on stdin, with no trailing newline (use echo -n).

    open D,d;chomp(@w=<D>);@m{a..z}=map{substr(sprintf("%b",-61+ord),1)=~y/01/.-/r}
    'BUWI?OKMATJQDCLSZGE@FNHVXY'=~/./g;%w=map{$_,qr#^\Q@{[s/./$m{$&}/reg]}#}@w;
    @a=[$i=<>];while(@a){say join$",@{$$_[1]}for grep!$$_[0],@a;
    @a=map{$x=$_;map{$$x[0]=~$w{$_}?[substr($$x[0],$+[0]),[@{$$x[1]},$_]]:()}@w}@a}
    

    (linebreaks only for formatting).

    Ungolfed code:

    # Read the word list
    open my $dictionary, '<', 'd';
    chomp(my @words = <$dictionary>);
    
    # Define morse characters
    my %morse;
    @morse{'a' .. 'z'} = map {
      $n = ord($_) - 61;
      $bits = sprintf "%b", $n;
      $bits =~ tr/01/.-/;
      substr $bits, 1;
    } split //, 'BUWI?OKMATJQDCLSZGE@FNHVXY';
    
    # Make a hash of words to regexes that match their morse representation
    my %morse_words = map {
      my $morse_word = s/./$morse{$_}/reg;
      ($_ => qr/^\Q$morse_word/)
    } @words;
    
    # Read the input
    my $input = <>;
    
    # Initialize the state
    my @candidates = ({ remaining => $input, words => [] });
    
    while (@candidates) {
      # Print matches
      for my $solution (grep { $_->{remaining} eq '' } @candidates) {
        say join " ", @{ $solution->{words} }; 
      } 
      # Generate new solutions
      @candidates = map {
        my $candidate = $_;
        map {
          $candidate->{remaining} =~ $morse_words{$_}
            ? {
              remaining => substr( $candidate->{remaining}, $+[0] ),
              words => [ @{ $candidate->{words} }, $_ ],
            }
            : ()
        } @words
      } @candidates;
    }
    

    Modus Operandi:

    Morse code symbols are stored by changing "." and "-" into binary digits 0 and 1, prepending a "1" (so that leading dots aren't gobbled up), converting the binary number into decimal, and then encoding the character with the value 61 higher (which gets me all printable chars and nothing that needs backslashing).

    I thought of this as a kind of partitioning problem, and built a solution based on that. For each word in the dictionary, it constructs a regex object that will match (and capture) that word's spaceless morse representation at the beginning of a string. Then it begins a breadth-first search by creating a state that has matched no words and has the entire input as "remaining input". Then it expands each state by looking for words that match at the beginning of the remaining input, and creating new states that add the word to the matched words and remove the morse from the remaining input. States that have no remaining input are successful and have their list of words printed. States that can't match any words (including successful ones) generate no child states.

    Note that in the ungolfed version the states are hashes for readability; in the golfed version they're arrays (for shorter code and less memory consumption); slot [0] is the remaining input and slot [1] is the matched words.

    Commentary

    This is ungodly slow. I'm wondering if there's a solution that isn't. I tried to build one using Marpa (an Earley parser with the ability to give multiple parses for a single input string) but ran out of memory just constructing the grammar. Maybe if I used a lower-level API instead of the BNF input...

    hobbs

    Posted 2014-08-24T17:02:38.540

    Reputation: 2 403

    If I add the same requirement as Kevin Reid (no newline in the input) I can save 7 characters by removing chomp(). Should I? – hobbs – 2014-08-25T04:53:58.307

    "Feel free to use any method convenient". – Comintern – 2014-08-25T05:23:16.530

    Shave 2 bytes with ord instead of ord$_. Shave 1 byte saying join$" instead of join" " – Zaid – 2014-08-26T06:40:12.210

    2

    PHP, 234 226 bytes

    function f($s,$r=""){$s?:print"$r
    ";foreach(file(d)as$w){for($i=+$m="";$p=@strpos(__etianmsurwdkgohvf_l_pjbxcyzq,$w[$i++]);)$m.=strtr(substr(decbin($p),1),10,"-.");0!==strpos($s,$m)?:g(substr($s,strlen($m)),$r.trim($w)." ");}}
    

    recursive function, takes dictionary from a file named d.
    Fails for every word in the dictionary containing a non-letter.

    You can use any file name if you define ("d","<filename>"); before calling the function.

    Add 2 or 3 bytes for faster execution:
    Remove $s?:print"$r\n";, insert $s!=$m? before 0!== and :print$r.$w before ;}}.

    breakdown

    function f($s,$r="")
    {
        $s?:print"$r\n";            // if input is empty, print result
        foreach(file(d)as$w)        // loop through words
        {
            // translate to morse:
            for($i=+$m="";              // init morse to empty string, $i to 0
                                            // loop while character is in the string
                $p=@strpos(__etianmsurwdkgohvf_l_pjbxcyzq,$w[$i++])
            ;)
                $m.=                        // 4. append to word morse code
                    strtr(
                        substr(
                            decbin($p)      // 1: convert position to base 2
                        ,1)                 // 2: substr: remove leading `1`
                    ,10,"-.")               // 3. strtr: dot for 0, dash for 1
                ;
            0!==strpos($s,$m)           // if $s starts with $m
                ?:f(                        // recurse
                    substr($s,strlen($m)),  // with rest of $s as input
                    $r.trim($w)." "         // and $r + word + space as result
                )
            ;
        }
    }
    

    Titus

    Posted 2014-08-24T17:02:38.540

    Reputation: 13 814

    2

    Haskell - 418

    This decoing problem can be solved efficiently by dynamic programming. I know this is a codegolf, but I love fast code.

    Say we have the input string s, then we build an array dp, dp[i] is the list of all valid decoding results of substring s[:i]. For each word w in the dictionary, first we encoding it to mw, then we can compute part of dp[i] from dp[i - length(mw)] if s[i - length(mw):i] == mw. The time complexity of building dp is O({count of words} {length of s} {max word length}). Finally, dp[length(s)], the last element, is what we need.

    In fact, we don't need to store the whole decoding as the element of each dp[i]. What we need is the last decoded word. This make the implementation a lot faster. It cost less than 2 seconds to finished the "hello world" case on my i3 laptop. For other cases posted in the question, the program will not finish pratically since there are too many to output.

    Using the dynamic programming technique, we can compute the number of valid decodings. You can find the code here. Results:

    input: ......-...-..---.-----.-..-..-..
    count: 403856
    
    input: .--..-.-----..-..-----..-.--..--...---..--...-.......--.-..-.-.----...--.---.-....-.
    count: 2889424682038128
    
    input: -.....--.-..-..-.-.-.--....-.---.---...-.----..-.---..---.--....---...-..-.-......-...---..-.---..-----.
    count: 4986181473975221635
    

    Ungolfed

    import Control.Monad
    
    morseTable :: [(Char, String)]
    morseTable = zip ['a'..'z'] $ words ".- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --.."
    
    wordToMorse :: String -> Maybe String
    wordToMorse xs = return . concat =<< mapM (`lookup` morseTable) xs
    
    slice :: (Int, Int) -> [a] -> [a]
    slice (start, end) = take (end - start) . drop start
    
    decode :: String -> String -> IO ()
    decode dict s = trace (length s) [] where
      dict' = [(w, maybe "x" id . wordToMorse $ w) | w <- lines dict]
      dp = flip map [0..length s] $ \i -> [(j, w) |
            (w, mw) <- dict', let j = i - length mw, j >= 0 && mw == slice (j, i) s]
    
      trace :: Int -> [String] -> IO ()
      trace 0 prefix = putStrLn . unwords $ prefix
      trace i prefix = sequence_ [trace j (w:prefix) | (j, w) <- dp !! i]
    
    main :: IO ()
    main = do
      ws <- readFile "wordlist.txt"
      decode ws =<< getLine
    

    Golfed

    import Control.Monad
    l=length
    t=zip['a'..]$words".- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --.."
    h s=return.concat=<<mapM(`lookup`t)s
    f d s=g(l s)[]where g 0 p=putStrLn.unwords$p;g i p=sequence_[g j(w:p)|(j,w)<-map(\i->[(j,w)|(w,m)<-[(w,maybe"x"id.h$w)|w<-lines d],let j=i-l m,j>=0&&m==(take(i-j).drop j$s)])[0..l s]!!i]
    main=do d<-readFile"d";f d=<<getLine
    

    Ray

    Posted 2014-08-24T17:02:38.540

    Reputation: 1 946

    Glad to see a reasonably efficient solution. Now I just have to understand it :) – hobbs – 2014-08-27T00:12:11.160

    1

    Groovy 377 337

    r=[:];t={x='',p=''->r[s[0]]=p+x;s=s.substring(1);if(p.size()<3){t('.',p+x);t('-',p+x)}};s='-eishvuf-arl-wpjtndbxkcymgzqo--';t()
    m=('d'as File).readLines().groupBy{it.collect{r.get(it,0)}.join()}
    f={it,h=[]->it.size().times{i->k=it[0..i]
    if(k in m){m[k].each{j->f(it.substring(i+1),h+[j])}}}
    if(it.empty){println h.join(' ')}}
    f(args[0])
    

    Notes

    The dict must be a file named d. The morse string is passed by command line. e.g.:

    % groovy morse.groovy ......-...-..---.-----.-..-..-.. | grep 'hello world'
    hello world
    

    For "morse code compression" I am using a binary tree

    cfrick

    Posted 2014-08-24T17:02:38.540

    Reputation: 313