.-...--..---.-...--..... the red-nosed reindeer

28

3

Can you recover a sequence of letters from a sequence of dots and dashes? In Morse code, x is represented by -..-, but without the help of spaces, -..---.-... could represent xmas, or dole, or teettteteee, etc.

I took the lyrics from "Rudolph, the Red-Nosed Reindeer" to make the following sequence of letters:

youknowdasheranddancerandprancerandvixencometandcupidanddonnerandblitzenbutdoyourecallthemostfamousreindeerofallrudolphtherednosedreindeerhadaveryshinynoseandifyoueversawityouwouldevensayitglowsalloftheotherreindeerusedtolaughandcallhimnamestheyneverletpoorrudolphjoininanyreindeergamesthenonefoggychristmasevesantacametosayrudolphwithyournosesobrightwontyouguidemysleightonightthenhowthereindeerlovedhimastheyshoutedoutwithgleerudolphtherednosedreindeeryoullgodowninhistory

and then I converted each letter to its Morse code representation to get

-.-----..--.--.---.---...-.........-..--.-..-...--.-.-...-..--.-...--..-..--.-.-...-..--.-.....-..-..-.-.-.-.-----.-.--.-..-.-...-.--...-...--.-..-..----.-...-..--.-..-....-....---...-.-.....---..----.-----..-.-..-.-..-.-...-..-.....-----...-..-..------..-....-....-.-.....-.---..-..-.-...-...-...--..---.-...--.....-......-..-..-.---....-...-....-.-.....-......--...-...-..-.-.--.........-.-.---.---.....--.-......-.-.-----..-....-..-.....-.--..--.-----..-.-----..-.-..-......-.-.....--.--..---..-..---.--....-.-...-..---..-.-.....----......-..-....-.-.....-...-....-..----.-...-..---......--.-..-.-..-.-...-........---..---....-.....-.---.....-..-..-...-.--.------.-..-...--..---.-...--......------..-...-..--.-.--.-....-.-.....-.--..---....-.....-.----....-.-----.--.-.---.-......-......---.-.......-.....--.-.--.-..---.----....--.--.-...--..---.-...--......--..-....-.-----..-.-.-.---.......----....-...--.....-.------.--.-----..---...-..-...---.--....-.....--.....-----...--.....--.....-.....---.---......-....-.-.....-..-..---...-.-........--.-...-.....-.--.......---..--.-..---..--.--..-....--..-.....-...--..---.-...--.....-......-..-..-.---....-...-....-.-.....-.-.-----..-.-...-..--.----..---.---...-..........----.-.-.--

For reference, here are a few Morse code libraries:

a .- b -... c -.-. d -.. e . f ..-. g --. h .... i .. j .--- k -.- l .-.. m -- n -. o --- p .--. q --.- r .-. s ... t - u ..- v ...- w .-- x -..- y -.-- z --..

.- a -... b -.-. c -.. d . e ..-. f --. g .... h .. i .--- j -.- k .-.. l -- m -. n --- o .--. p --.- q .-. r ... s - t ..- u ...- v .-- w -..- x -.-- y --.. z

.- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --..

Write the shortest code that takes as input the sequence of dots and dashes above and one of the Morse code libraries above, and then outputs the sequence of letters above.

Edit 1: Since we are ignoring the letter spaces and word spaces that are typically used in Morse code, this challenge is nontrivial and significantly different from previous challenges involving Morse code. We are implicitly hunting for the relative Kolmogorov complexity of a fixed sequence of letters given the corresponding sequence of dots and dashes. This means your code only needs to run for the fixed inputs provided above, and produce the desired output provided above.

Edit 2: By popular request, I am relaxing the constraint on the Morse code library by providing two alternative libraries. If necessary, I am willing to add reasonable libraries in the future.

Dustin G. Mixon

Posted 2019-12-25T18:55:23.580

Reputation: 1 833

6A common misconception: that morse code only has two primitives. It has at least four. You forgot inter-letter gap and inter-word gap. (Though you've deliberately ignored the later.) – David G. – 2019-12-25T19:22:01.533

9@DavidG. - Indeed, we are ignoring both gaps for this challenge. This is what makes the challenge interesting (at least to me). – Dustin G. Mixon – 2019-12-25T19:45:40.307

Is the library format fixed? – Adám – 2019-12-25T19:59:57.973

@Adám - Yes, the library format is fixed. – Dustin G. Mixon – 2019-12-25T20:05:30.627

May we use uppercase? – Adám – 2019-12-25T20:23:33.827

@Adám - No, we will use lower case in this challenge. – Dustin G. Mixon – 2019-12-25T20:24:29.183

2If the input and output are fixed, why bother having input? – Xcali – 2019-12-25T23:14:38.047

4@Xcali - You may find that the input provides substantial information about the desired output. The existing answer exploits this fact. – Dustin G. Mixon – 2019-12-25T23:15:55.580

1Side note: The seventh reindeer was originally Donder, but many people confuse it with the German for thunder, since Blitzen is the German for lightning. – Neil – 2019-12-26T00:21:19.240

It's really a shame about the strict input format for the library. Converting it into a usable format is really a separate codegolf unto itself that distracts from what would otherwise be a really good challenge. Note that neither the Charcoal answer (which assumes input in a convenient format) nor the APL answer (which uses a built-in morse I think) make direct use of your 'library'. – Chas Brown – 2019-12-26T02:54:25.203

@ChasBrown I've now updated my answer to take the input in the literal format required (but I've left the old answer for posterity). – Neil – 2019-12-26T11:41:08.860

@ChasBrown - I don't think the library format is the biggest bottleneck right now. So far, everyone is appealing to the sequence of 474 integers between 1 and 4 that indicate how many dots/dashes are used for each letter. In the naive coding, this takes up 118.5 bytes, and no one has compressed it yet. – Dustin G. Mixon – 2019-12-26T13:07:42.353

@DustinG.Mixon there’s not a lot of room to compress it. Using Python 3’s gzip, bz2 or lzma libraries all increase the storage required. – Nick Kennedy – 2019-12-26T14:52:45.003

@NickKennedy - I bet some computer languages have access to English language models that could help with compression. – Dustin G. Mixon – 2019-12-26T15:00:24.593

@DustinG.Mixon ignoring the Morse and using Jelly’s built-in dictionary is also more bytes. – Nick Kennedy – 2019-12-26T15:03:03.867

11library is a funny word for a dictionary or look-up table. – Kaz – 2019-12-26T17:26:35.683

2

I wanted to see how this problem would behave with an English dictionary thrown at it. Using Jelly's dictionary reduced the data to 87 bytes: 6 bits per word, with 116 "words". I cheated a bit, since most but not every word is in the dictionary – faking the positions of 13 word gaps precluded having to add any words to the dictionary. Preview on repl.it – might be possible to golf this small enough in asm, or more likely, a golfing language (maybe even Jelly).

– Deadcode – 2019-12-31T10:15:21.393

1To be clear, that is with using Jelly's dictionary, and the Morse code representation, and the Morse code library, and encoding the index of the actual word out of the set of possible words, for each word. – Deadcode – 2019-12-31T10:37:54.133

1@Deadcode: Great work! Does Jelly have access to Jelly's dictionary? I mean, I know that you can write compressed string constants that make use of the dictionary, but can one just load up a list of all of the words and then work with them? – A. Rex – 2019-12-31T14:16:04.400

@A.Rex Thanks! And it does not, but it does allow evaluating Python code. So I think you could compress dictionary, short, long, import, etc. using Jelly's own dictionary to access itself. Oh, and what I was thinking! repl.it isn't required to Try it online!

– Deadcode – 2020-01-01T07:39:40.600

Here's a version that would probably make for smaller Jelly code: Try it online! The data could also be smaller, stored in Base 60 instead of just 64 or 63, meaning it could fit in 86 bytes instead of 87, even in Jelly's Base 250 format (because the last number is small). The data would also be much more amenable to Huffman encoding (although the code to do that might not break even).

– Deadcode – 2020-01-03T22:35:28.030

@A.Rex Accessing Jelly's dictionary in Jelly: “ṬḟɲÇÐÇcØj2ıẉỊ°§⁸ġḢṀ[/c<¢»ŒV“d”ŒVŒl“rudolph”e – So it takes 36 bytes to access the lowercase version of it. That might be just small enough...

– Deadcode – 2020-01-04T22:10:55.593

Answers

15

Python 3, 356, 302, 299 bytes

import sys
S,L,N="",input().split(),int("3EED3TD7OWRVC3DD9L9NKQY0NEADDU7704EYGXWLFF2PGCR6H6ET3DRODQ4O22XAAFV5KOSWET2D2CQZOMXRQWEO7I2US8FRU5Z2CU8D73B10GEG08YUJQPZND1978359JZOEUGKHT2HXO70Z6PK59T8RMY0BBRJ1D78AZEEYBWU4N7IW1TLUN57",36)
while N:S,N=S+chr(97+L.index(sys.stdin.read((N&3)+1))),N>>2
print(S)

Try it online!

Takes the library followed by a newline followed by the morsecode from the standard input.

The library used is .- -... -.-. -.. . ..-. --. .... .. .--- -.- .-.. -- -. --- .--. --.- .-. ... - ..- ...- .-- -..- -.-- --..

Explanation:

The majority of the code is a precalculated number of morse widths. A morse caracter in the library is between 1 and 4 dihs and dahs long. In the number i store this as two bits of information. To recall this number i & it with 3 and shift the result over by two bits.

I then read the number of dihs and dahs from the standard input and lookup in the library what the offset of that caracter is from "a". Then repeat and print.

I think the best way to shave of bytes will be to store the number more compactly.


Edit: removed 54 bytes by changing the library format, and storing the number as base36 following the suggestions of A Rex and Chas Brown


Edit: -3 bytes for removing redundant parameter from str.split() - Chas Brown

steviestickman

Posted 2019-12-25T18:55:23.580

Reputation: 251

Wow, nice first answer! – connectyourcharger – 2019-12-26T15:34:40.703

The question has been updated to allow different formats for the library. I believe this would let you skip the [1::2] in your answer. – A. Rex – 2019-12-26T16:03:36.150

1308 by encoding your Magic Number as base 36 and putting the while loop on one line. (And, Welcome to codegolf!) – Chas Brown – 2019-12-26T21:27:03.610

You can save 3 more by replacing .strip(" ") with .strip(). – Chas Brown – 2019-12-27T05:47:19.383

9

APL (Dyalog Extended), 189 188 bytesSBCS

Full program. Prompts for the Morse code. Doesn't need the library. Requires ⎕IO←0

⌊⌂morse⍞⊂⍨∊1↑⍨¨1+4⊤'⍳÷+≠í⍀÷<í∧○Ñ>å/∧∨êî∘ß∧äñ⍀○≠è[*⌷∨[î≠∇è⊂Øî⊃⍀[⌷Éäï\∘∧↑æ=íå⌿∧∨ëäÖ⌿ê/≤⌊∘[Å↓Öï○+Ü÷≥↓≤≥>⌷∘á⊃↑Ü⍳⊣[⌿-Ø↑ä/∧∧≠≠Ñíí+ßÜ+-○é↓↓≥≥×∧↑**⍨↑}∧⍨≥ç⍀∧}×⍀Àëç⍀Ø⍨∧∘[×/∘Å∧~≥○ëâ⊂å∨≥Àäè?'⍳⍨125↓⎕AV

Try it online!

⎕AV the Atomic Vector (the character set)

125↓ drop the first 125

''⍳⍨ the indices in that of these characters

4⊤ To-base four

1+ increment

This gives us the length of each Morse letter.

¨ for each of those lengths:

1↑⍨ take that many elements from 1, padding with zeros when needed

ϵnlist (flatten)

⊂⍨ use that as a mask indicating partition-beginnings to cut the following:

 prompt for Morse code

⌂morse convert from Morse code to uppercase text

 lowercase

Adám

Posted 2019-12-25T18:55:23.580

Reputation: 37 779

7

Jelly, 138 135 bytes

ḲiⱮ“Q®ḃƁ⁼ṇÑ1¿Ọɠ\ƥȥ⁼ṙ⁴EŀƥŒEæʋØ>þ>⁴þʋ[⁾Ƒ]eÞḍ8ß®ⱮȮø⁹ç÷R?g®=,¥ėµ@ØU~"/Xçq®ÆƬ°ɠsṭṭ?dC#Ċ?gѶPÇH5p@ʂEœ⁹½G6ḌƘṪµɦ8€(S¦Ḥ⁸ṫ"ḟḢlṾ¦ƤḃƭẊ@’ḃ4$ÄṬk⁹¤ịØa

Try it online!

A dyadic link taking the library as the left argument and the Morse code input as the right argument. Returns a Jelly string of the lyrics. The long integer that takes the majority of the code has the lengths of each Morse code character encoded within it.

Explanation

Ḳ                      | Split at spaces
 iⱮ              ¤     | Find index of these in each of the following:
   “Q...@’             | - Base-250-encoded integer 742925387377430937422864458121941352705403867269973650879937105837755604647313120541399315429452997316260711263757137193164236124383957174543961671910365268304120327243209050996915401898167278257845227193061059012165352924148307417330452740236835399667569278751322913192633409885614815
          ḃ4$          | - Bijevtive base 4
             Ä         | - Cumulative sum
              Ṭ        | - Convert to logical list with 1s at the appropriate indices
               k⁹      | - Split right argument at those positions
                  ịØa  | Finally, index into lowercase letters

Nick Kennedy

Posted 2019-12-25T18:55:23.580

Reputation: 11 829

7

APL (Dyalog) + dfns + Jelly's dictionary, 181 175 (172) bytesSBCS

Full program. Requires ⎕IO←0 and ⎕AV redefinition. Doesn't use the challenge's Morse code library.

M←∊∘morse¨D⋄C←∊(7⍴51)⊤246⊥5 17⍴¯7+⎕AV⍳'k%aÏ⍵FþfÅ⍵ìÆ⌶á≥å3¶cÒ⍺\ujü─⍳%S⊣ì⍙ÒÛÙ3Ùp`Ø,hÇñ⊂┴Ö⍪l∪⍒i1⍵Í:-îà⌽ì≥3€ã(⍝S]ÁGXræ∇O7Uæî'
C↓⍨←≢i↓⍨←≢M⊃⍨w←(⊃C)⌷∊(⍸M⍷⍨⊂)¨⌽,\24↑i
⍞←819⌶w⊃D
→2

Try it online! - 175 byte version, running at decent speed

M←∊∘morse¨D⋄C←∊(7⍴51)⊤246⊥5 17⍴¯7+⎕AV⍳'k%aÏ⍵FþfÅ⍵ìÆ⌶á≥å3¶cÒ⍺\ujü─⍳%S⊣ì⍙ÒÛ─3Ùp`Ø,hÇñ⊂┴Ö⍪l∪çi1⍵Û:-îà⌽ì≥3€ã(⍟S]ÁôXræ∇O7Uæî'
C↓⍨←≢i↓⍨←≢M⊃⍨w←(⊃C)⌷∊(⍸M⍷⍨⊂)¨⌽,\i
⍞←819⌶w⊃D
→2

Try it online! - 172 byte version, taking about 20 times as long to finish (especially slow at beginning)

-3 bytes by packing base 51 into base 246 instead of base 64 into base 256 and losing the apostrophe -3 bytes by ending with an error message instead of cleanly
-3 bytes by running about 20 times more slowly

When I saw this question, I thought that the ultimate solution, the one the question was made for, would involve combining both the Morse code and a dictionary. I was surprised when the only submitted solutions turned out to use a table of letter boundaries. My interest was sparked; I'd take a crack at it myself.

The first step was to look for a suitable dictionary (I tried a few). Then I implemented the first version of the algorithm in C++. This enumerated all the possible words that could be at a given point, numbering them sequentially, until finding the one matching its corresponding value in an array made from running the same algorithm with the words of the song as a reference. This initial version enumerated the words in alphabetical order.

I was rather astonished to see just how ambiguous Morse code is without the inter-letter gaps. I expected it to be somewhat ambiguous, but not to the extent that it actually turns out to be! Some of the alternatives are pretty funny, too. Every time reindeer comes up, for example, it could also be leech. And the entire song can be mapped to completely different words, all the way from start to finish (gibberish, but present in the dictionary).

The dictionaries I tried turned out to be not so well-suited to the task. Either they were missing too many words, requiring them to be added to the dictionary for the algorithm to work, or they were too comprehensive, and the resulting array of "word choices" had elements exceeding 63, meaning it couldn't be packed into 6 bits per word.

Then it occurred to me to try Jelly's dictionary. It turned out to be perfect for this. Not every word is in the dictionary, including simple words like a, names, games, and glows, and the contractions won't (wont isn't in it either) and you'll, but I took advantage of the fact that there are no spaces in the output for this challenge, and cheated:

  • Blitzenblitz en
  • had aHa DA
  • glows allglow sall
  • names theyNam EST hey
  • games thengamest hen
  • won't youwon Ty ou
  • him as theyHima ST hey
  • you'llyou LL

It was a spookily perfect fit for this challenge, even containing all of the proper nouns other than Blitzen. Another cheat was to use compound words to shrink the size of the array of word choices: outwith and godown. As result of all this, the word choice array had 116 elements in it, and with none exceeding 63, the whole thing could be packed into 87 bytes. (This, despite the compromise of encoding then as th en so as not to exceed the value of 63.) Contrast this with the array of Morse codeword boundaries, which takes 119 bytes (118.5 rounded up).

But that leads to the next problem: Although the data size is considerably smaller, the code complexity is ballooned in comparison. I didn't know any golfing languages, and most of them have very limited documentation that's quite hard to use to learn from scratch.

So I worked on refining the algorithm. The next optimization was to iterate through letters from longest Morse codeword length to shortest, still in C++. This actually decreased the maximum index value to 59, meaning that with arithmetic rather than bitwise packing it could use base 60 to fit in 86 bytes. (For illustrative purposes I also ported this one to Java, which was interesting... I learned a little more Java, and was surprised to see it run even faster than the C++ version, using >125% of a CPU core, even though the algorithm is inherently single-threaded. My best guess is that the hash function is multithreaded even for individual look-ups.) This data set would respond better to Huffman encoding than the previous one, though not enough to even break even.

I tried optimizing the C++ code for smaller compiled object code, evaluating if it was worthwhile to attempt hand-coding it in x86-32 assembly. It stubbornly remained way too large; the best I could do was to squeeze it in at just under 256 bytes, not including the data. So it wasn't worth hand-coding in asm (the letter-boundary version beats it by a huge margin).

So it seemed that the only way this algorithm would have justice done to it would be for me to learn a golfing language. Quite daunting, given that I've never used anything like these languages.

And then Adám offered to teach me APL. I jumped at the chance; while APL doesn't usually win challenges where there's also a Jelly or 05AB1E contender, it tends to come pretty close, has a deep and rich past, and formed the basis of or influenced many newer languages. Heck, I knew of it as a child, and was a bit curious about it even then, but never took a crack at learning it until now.

Many thanks to Adám for extensive help in teaching me various aspects of Dyalog APL and golfing down this program – my first APL program.

I initially tried porting the algorithm as it stood to Dyalog APL. It worked, but was very slow, even when compiled; in 60 seconds it only gets up to donner or so. Adám conferred with Marshall at Dyalog, and there wasn't really a way to speed it up any further. But it was suggested that encoding the entire dictionary in Morse code could result in a speedup. And I realized that this strategy could make for much better golf, too.

So the next stage in adapting the algorithm was to translate the entire dictionary to Morse code, and at each point in the encoded song, iterate through possible words in reverse order of their length in Morse dots+dashes (with the longest being 24). Not only is this approach far, far faster in APL (and probably many other languages), and far more concise, but even the data is more lightweight – far more of the "word choice" array elements are 0 or 1, the largest value is just 50, and there are only 115 elements (the then no longer needed or benefited from breaking up), meaning that in languages with arbitrary-precision base conversion, it could be packed in base 51 and take up only 82 bytes. In APL though, the overhead of implementing arbitrary-precision base conversion would override the benefit, and I stuck with packing groups of 7 elements, in base 51, into groups of 5 bytes in base 246 (as log246 517 ≈ 4.999295), taking up 85 bytes. This data set responds even better to Huffman encoding than the previous, but still wouldn't break even: 4 1 2 21 1 13 0 23 0 3 26 1 23 0 26 0 27 15 8 6 0 34 10 0 1 1 7 0 43 23 0 4 37 50 1 0 3 23(24) 22 6 15 6 16 3 2 17 10 12 5 0 9 31 1 3 9 21 0 32 16 9 20 32 8 12 31 5 0 4 27 14 1 7 13 16 0 0(1) 27 4 4 22 10 0 16 5 7 15 3 12 18 4 2 13 1 0 50 4 49 2 1 4 38 20 0 0 19 0 43 23 0 1 6 12 0 38 0

So as it turns out, in APL it still doesn't golf well enough to beat the existing letter-boundary solutions, even if the overhead for unpacking the "word choice" array (24 bytes) and for using Dyalog Unicode instead of Extended (7 bytes) are subtracted (yielding 144 (141) bytes code+data). I still think that it might be possible to make it shorter than 129 bytes in one of the languages more specialized towards golfing, and will likely try that eventually.

Of course, with bigger source material than the song this challenge uses, this dictionary-based approach would win hands-down. Some interesting extensions to it would be: Look 2 or more words ahead, and perhaps use an N-gram frequency table; Look all the way ahead, using a model of most-likely N-grams (probably would be incredibly slow).


Explanation of the code

Dyalog Unicode is used, even though Dyalog Extended could save 7 bytes (4+3, as described below), because the former allows compiling for speed, whereas the latter is very slow, to the point that it would be impractical for demonstration.

On my system, this version takes about 45 (or 901) seconds to finish (the first 7 seconds of this are taken converting the dictionary into Morse code). On TIO it doesn't quite finish within 60 seconds.

Header

⎕IO←0 - set Index Origin to 0 (for 0-based array indexing)

⎕AVU[62 63 60 93 11 94]←9016 9060 9056 8838 9018 9080 - modify the default Atomic Vector to match the actual Single-Byte Character Set we're using (£¢¥ýɫ· overwritten with ⌸⍤⍠⊆⌺⍸, at each character's respective index within the Atomic Vector)

'morse'⎕CY'dfns' - import the Dyalog APL Morse code dfn – I decided to keep this in the header because Dyalog Extended has it by default

D←1(819⌶)'^\w+ = ''{3}|''{3}\.split\(\)$'⎕R''⊃⎕NGET'/usr/local/lib/python3.7/site-packages/jelly/dictionary.py'1

Load Jelly's dictionary, and convert it to lowercase:

⊃⎕NGET'/usr/local/lib/python3.7/site-packages/jelly/dictionary.py'1 - read the file as a set of newline-delimited lines into an array (with empty lines dropped, of which there is one)

'^\w+ = ''{3}|''{3}\.split\(\)$'⎕R'' - using a regex (PCRE) Replace, remove the Python code that surrounds the dictionary's two word lists (short and long), resulting in a merged list that keeps everything from short before everything from long (but is otherwise in case-insensitive alphabetical order)

1(819⌶) - convert to uppercase, so that the morse dfn will work on it (since it only takes/gives uppercase input/output)

D← - assign to the variable D

∇D R i;C;M;w

Define the tradfn R, taking left argument D (dictionary word list) and right argument i (the input, i.e. the song in undelimited Morse code), and having local variables C (word choice array), M (dictionary translated to Morse code), and w (array listing the indices into the dictionary of what the current word could be). The arguments are passed and the local variables are declared to allow the R to be compiled properly (see footer).

Code

M←∊∘morse¨D⋄C←∊(7⍴51)⊤246⊥5 17⍴¯7+⎕AV⍳'k%aÏ⍵FþfÅ⍵ìÆ⌶á≥å3¶cÒ⍺\ujü─⍳%S⊣ì⍙ÒÛÙ3Ùp`Ø,hÇñ⊂┴Ö⍪l∪⍒i1⍵Í:-îà⌽ì≥3€ã(⍝S]ÁGXræ∇O7Uæî'

∊∘morse¨D - compose the two functions (Enlist - flatten / join) and morse, and apply the composed function to the dictionary D. Without the composed in, morse returns an array of Morse codewords when given an alphabetical string as input, but we need the Morse-encoded dictionary to leave out inter-letter gaps, thus the flattening.

M← - assign the result to M, the Morse-encoded dictionary.

- statement separator

'k%aÏ⍵FþfÅ⍵ìÆ⌶á≥å3¶cÒ⍺\ujü─⍳%S⊣ì⍙ÒÛÙ3Ùp`Ø,hÇñ⊂┴Ö⍪l∪⍒i1⍵Í:-îà⌽ì≥3€ã(⍝S]ÁGXræ∇O7Uæî' - word choice array, packed into a string using Dyalog APL's updated SBCS. For this to work, the encoding must not contain newlines, i.e. ⎕UCS 10, 12, or 13 (⎕AV[2 3 5]), and any apostrophes must be doubled (but there are none).

⎕AV⍳ - map characters to SBCS indices (in the range 0-255) using the modified Atomic Vector.

¯7+ - add −7 to all SBCS indices. The corresponding addition of 7 was done during the encoding, to avoid occurrences of any of the three types of newlines. Due to use of base 246, this addend is quite flexible, and could range from 6 to 10 and be guaranteed to stay within range regardless of the data; thus it can be adjusted to minimize the number of apostrophes (which would each cost 1 byte, as they need to be doubled in a quoted string). As it happened, it was possible to have zero apostrophes.

5 17⍴ - arrange into a matrix for base conversion; each of the 17 columns will be converted as a unit. (In Dyalog Extended, this could be done as 5 ¯1⍴.)

246⊥ - base-convert the 5 values in each column that range 0-245 (base 246) into a single number ranging 0-900897818975, the top row being most significant, yielding a flat array. (But due to the nature of the underlying data, it will actually range 0-897410677850.)

(7⍴51)⊤ - base-convert each array element back into a column, with 7 rows, each number in the range 0-50 (base 51). (In Dyalog Extended, this could be done as 51⊤. The number of rows would be autodetected, so if by a coincidence all values were less than 516, this shorthand syntax wouldn't work properly. In this case though, in would work, and would save 4 bytes.)

- flatten the matrix into an array, row by row.

C← - assign to the variable C, the word choice array.

C↓⍨←≢i↓⍨←≢M⊃⍨w←(⊃C)⌷∊(⍸M⍷⍨⊂)¨⌽,\24↑i

This is Line 2 (as referred to by the :GoTo at the end of the loop):

i - take the current value of the input (with the parts of it that have already been processed deleted)

\24↑ - truncate to 24 characters (dots+dashes). That is the length of the longest Morse encodings (rudolph and christmas). This step is omitted in the version, which results in almost identical operation, but far slower; the word choice array only needs to be modified in two places (increasing the choice value by 1 at the and in andifyoueversawit and at christmas), meaning that at all other words, no matches for >24 dots+dashes are found in the dictionary.

,\ - make a list of Morse subsequences starting at the beginning of this, ranging from 1 dot/dash long to 24 dots/dashes long (or maximum length, in the version), from shortest to longest (e.g., at the very beginning, would be '-' '-.' '-.-' '-.--' '-.---' '-.----' '-.-----' '-.-----.'...'-.-----..--.--.---.---..')

- reverse that list, so that it goes from longest to shortest (this way of encoding results in smaller element values for the word choice array on average)

(⍸M⍷⍨⊂)¨ - to each item in this list, apply the function ⍸M⍷⍨⊂: =enclose one level of nesting deeper, to match the nesting of the Morse-encoded dictionary; M⍷⍨=return an array the same size as the dictionary, where values of 0 mean "not found" and 1 means "found", =convert the list of 0s and 1s into a list of the indices of the 1s. The result is a matrix with 24 rows (ranging from longest to shortest Morse, top to bottom), each of which contains a variable number of columns (minimum zero), listing the indices of all the words in the dictionary that match the Morse sequence from the input of the corresponding length (in alphabetical order, but with Jelly's short dictionary entries preceding its long ones).

- flatten this. The end result is a list of the indices of all the words in the dictionary that match any item in the list of Morse sequences from the input that range from 24 to 1 dots+dashes long, retaining the sorting of longest to shortest (and within each length, entries from Jelly's short come before those from its long, and within that, they're alphabetically ordered).

(⊃C)⌷ - from the resulting list, take the element at the index of the current word choice (=first element of) from the word choice array C. Equivalent to wrapping everything to the right in ( ) and putting [⊃C] to the right of it, but saves 2 bytes.

w← - assign this (the index into the dictionary of the correct current word in the song) to the variable w, returning it as a result

M⊃⍨ - return the wth element of M (the Morse code of the current word), unwrapped. Equivalent ⊃M followed by wrapping everything to the right in [ ], but saves 1 byte.

- return the length of the Morse code of this word (in characters, i.e. dots+dashes). If it had not been unwrapped by , the result would always be 1.

i↓⍨← - delete that many characters (the length of the word in Morse code) from the beginning of the input i, so that on the next iteration of the loop, the next word will be handled, and return the number of characters deleted as the result

- return the length of the number of characters; since that is a scalar, it will always be 1

C↓⍨← - delete that many (1) characters from the beginning of C, the word choice array, so that on the next iteration of the loop, the next word choice will be ready. This saves 1 byte relative to what a separate statement (C↓⍨←1) would take.

⍞←819⌶w⊃D

w⊃D - Take the wth element of D (the current alphabetical word) and unwrap it, so that when printed, there are no spaces to its left or right. Equivalent to ⊃D[w].

819⌶ - Convert to lowercase. Equivalent to 0(819⌶). (In Dyalog Extended, this could be done as , saving 3 bytes.)

⍞← - Print the result without a newline. In TIO, this prints to stderr.

→2

:GoTo Line 2 (the beginning of the loop), unconditionally. A line number is used to save the 2 bytes that defining a label (e.g. L:) would cost. The lack of a looping condition results in the program ending with * Command Execution Failed: INDEX ERROR when it tries to access the first element of an empty array (after it has successfully printed the entire song).

If running this program locally, note that the error prevents ]runtime from working. To work around this, the program may be called with :Trap 3...:Else ⋄ :EndTrap wrapping it.

The clean version, avoiding this error, would be →2⊣¨i, meaning :GoTo Line 2 while i is not empty. This would work because 2⊣¨i returns an array of 2s matching the length of i, and the :GoTo goes to the line number of the first value in the array, if one exists.

Footer

- end the tradfn R

2(400⌶)'R' 'morse' - compile the user tradfn R and the built-in dfn morse. Without this, the program would run far slower.

⍎'D R⍞⋄⍬' - Call tradfn R with dictionary D as its left argument and the input () as its right argument. This passing of arguments is done only for speed, so that the compile can work properly (it doesn't like global variables, or directly using input). The and constitute a workaround Adám wrote for me for the way the APLs work currently on TIO. They expect code in the input, but I wanted a clean separation between code and input.

Deadcode

Posted 2019-12-25T18:55:23.580

Reputation: 3 022

5

Python, 487 bytes

lambda a,b:'youknowdasheranddancerandprancerandvixencometandcupidanddonnerandblitzenbutdoyourecallthemostfamousreindeerofallrudolphtherednosedreindeerhadaveryshinynoseandifyoueversawityouwouldevensayitglowsalloftheotherreindeerusedtolaughandcallhimnamestheyneverletpoorrudolphjoininanyreindeergamesthenonefoggychristmasevesantacametosayrudolphwithyournosesobrightwontyouguidemysleightonightthenhowthereindeerlovedhimastheyshoutedoutwithgleerudolphtherednosedreindeeryoullgodowninhistory'

Takes morse code and any of the three libraries as input, returns the text.

ScoopGracie

Posted 2019-12-25T18:55:23.580

Reputation: 51

Heh; I wonder if any language could save anything by packing the string from 8-bit characters down to 5 or 6-bit characters (all lower case letters so 32 code points is sufficient). That would leave ~180 bytes for decompression code with bit-shifts (back to ASCII / UTF-8) for this to break even. Maybe good for assembly language / machine code. I'm assuming we can't justify using a 5 or 6-bit character set for the actual output without also counting the size in 5-bit bytes. – Peter Cordes – 2019-12-29T03:14:55.957

I saw what you did there :-) – Kjetil S. – 2019-12-29T04:11:06.217

1

@PeterCordes I had a similar idea and took a crack at it. Feel free to suggest improvements!

– Seb – 2020-01-06T00:27:40.910

3

Charcoal, 145 141 bytes

≔⪪⮌θ¹θF”)“1§μX↶)⟧¶»-↘3ω◧i⌊⊞⁸Q9m;T›|LDμ…›O⦄hTYIO5x÷↶׋,⟲"yMLUCgq.BNE2⁴⎇c⬤χN±⸿↙&7⟧χ⁰‹Ys´ΣC⁼{9ω�Pi&◧êKσPDfY9wi⟲f[Φτ³J‹‴V[^WREαF[-T^ς”§β⌕⪪η ⭆⊕ι⊟θ

Try it online! Link is to verbose version of code. Edit: Saved 3 4 bytes by using a less inconvenient library format. (Thanks to @Deadcode for pointing out that I'd pasted in the wrong version.) Explanation:

≔⪪⮌θ¹θ

Reverse the sequence and split it into individual characters.

F...

Loop over the characters of a compressed string which are the digits 0..3 (which compress better than 1..4 for some reason).

§β⌕⪪η ⭆⊕ι⊟θ

For each digit, pop one more than that many characters from the sequence, concatenate them, look them up in the library split in spaces and output the desired character (using the predefined lowercase alphabet).

137 bytes with an even more convenient library format:

≔⪪⮌θ¹θF”)“1§μX↶)⟧¶»-↘3ω◧i⌊⊞⁸Q9m;T›|LDμ…›O⦄hTYIO5x÷↶׋,⟲"yMLUCgq.BNE2⁴⎇c⬤χN±⸿↙&7⟧χ⁰‹Ys´ΣC⁼{9ω�Pi&◧êKσPDfY9wi⟲f[Φτ³J‹‴V[^WREαF[-T^ς”§η⭆⊕ι⊟θ

Try it online! Link is to verbose version of code. Explanation:

≔⪪⮌θ¹θ

Reverse the sequence and split it into individual characters.

F...

Loop over the characters of a compressed string which are the digits 0..3 (which compress better than 1..4 for some reason).

§η⭆⊕ι⊟θ

For each digit, pop one more than that many characters from the sequence, concatenate them, look them up in the library and output the desired character.

Neil

Posted 2019-12-25T18:55:23.580

Reputation: 95 035

Shame. I at least managed to golf a byte off the code to read the inconvenient output... – Neil – 2019-12-26T11:37:11.763

It appears the question has been relaxed to allow some other library formats. How much does that help you? – A. Rex – 2019-12-26T16:21:27.067

1@A.Rex I originally thought it would only save me a byte but I managed to golf a couple of further bytes off. – Neil – 2019-12-26T18:40:58.177

There's a typo in your current 142 byte version. When fixed, it's actually 141 bytes: Try it online!

– Deadcode – 2020-01-01T07:56:58.893

@Deadcode Bah, I remember fixing that, I don't know how I managed to paste in the wrong version... – Neil – 2020-01-01T16:47:43.287

3

Perl 5, 283 244 bytes

$s=<>;$s=~s/..{$_}//+print$s=~/(.) \Q$& / for map{vec unpack('u',q{MJZDYEM98;HWE31N4>Y9:6'Y,BZYRSZ1<JI2@]ZH_(Z8H)7@V[G5*V2NC&:NZM3':XVCLCHY2@(IY>WE\5,L<XK*K^6U5+"5K(9*S^)B5CT07:JK_Q:HINCL:J=23LYEL.T,E*".U[)[`HJQ@ZJ/R.F*"6XOJK5B0X`}),$_,2}0..473

Try it online!

This corresponds to:

$L = <<'';                            # length bits in base64
MJZDYEM98;HWE31N4>Y9:6'Y,BZYRSZ1<JI2@]ZH_(Z8H)7@V[G5*V2NC&:NZM3':XVCLCHY2@(IY>WE\5,L<XK*K^6U5+"5K(9*S^)B5CT07:JK_Q:HINCL:J=23LYEL.T,E*".U[)[`HJQ@ZJ/R.F*"6XOJK5B0X`

$s = <>;                              # read inputs
$s =~ s/..{$_}//                      # take current length morse code from start
  + print $s=~/(.) \Q$& /             # ...and print corresponding letter
  for map vec(unpack('u',$L), $_, 2), # find morse code lengths of the letters
      0..473                          # 474 letters

This could perhaps be further shortened by encoding in base128 or directly in "base256". Huffman encoding the lengths could shorten the length data, but most probably enlarge the rest of the code by too much.

Kjetil S.

Posted 2019-12-25T18:55:23.580

Reputation: 1 049

3

MATLAB R2016a, 290 285 277 266 245 223 219 bytes (99+119+1)

function r(m,d);[a,b]=ismember(mat2cell(m,1,fread(fopen('j'),474,'bit2')+3),strsplit(d));char(b+96)

The inputs are the Morse code sequence m and the dictionary d in the third format. The function uses a 119-byte file j available here that contains the lengths of the 474 Morse codewords. According to the rules of scoring multiple files, this adds an additional byte to the score.

Dustin G. Mixon

Posted 2019-12-25T18:55:23.580

Reputation: 1 833

3

05AB1E, 149 131 129 bytes

•Nα0Ω¤èGë™jƒć›
Ù¾₂¶É¨=Yàf§ì[ðEÎλ`÷¡2Ä|~āΔí¹–4₂≠āó3Tv₁‰[Öò‚S»‰CþEηFA‡γ÷¯¿Þ₃âÖ.³εûV^ÕÜ₄Ú¹þ‘ΘVÅÍÅΘΛ9m|\Ó8g8ì&ÓiUΩÎQ:ǺZÎgöæ•4в>£I#A‡

-18 bytes thanks to @Deadcode.
-2 bytes thanks to @Grimmy.

Uses the complete morse-string as first input, and the lookup format as second input.

Try it online. (Outputs as a list of characters, so the J in the footer is to join it together to a single string to pretty-print.)

Explanation:

•Nα0...göæ• # Push compressed integer 2178625864988239449532491166807818445418276694861550892023336765049187401437036816274840827319861935617150192366925629812414499033245212320687403711053517272798896860734190937884895522347710320033920004354990136205425279052925026955638024859428057287730519725429748232597858786841294379
   4в       # Converted to base-4 as list: [3,2,2,2,1,2,2,2,1,2,3,0,2,1,1,2,2,1,1,3,0,2,1,1,2,3,2,1,1,3,0,2,1,1,2,3,1,3,0,1,3,2,1,0,0,1,1,2,3,2,3,1,2,1,1,2,2,2,1,1,0,2,1,1,2,3,3,1,0,3,0,1,3,2,0,2,2,3,2,2,2,0,3,1,3,3,0,3,0,1,2,2,0,3,1,1,2,2,2,2,0,1,1,2,0,0,2,2,3,1,3,3,2,2,2,2,3,3,3,0,3,0,2,0,2,1,2,2,0,2,2,0,1,1,2,0,0,2,3,1,2,1,3,0,2,3,2,3,1,1,3,1,2,2,0,1,1,2,1,3,3,2,2,0,3,0,2,2,1,2,1,0,3,2,2,2,2,2,3,2,0,3,0,1,2,1,3,1,0,2,3,2,2,2,1,3,3,2,3,0,3,0,2,0,3,0,2,2,0,1,1,2,0,0,2,2,2,0,2,0,2,3,1,2,2,3,1,1,2,3,1,3,3,3,1,1,1,1,1,0,2,0,3,0,3,1,0,3,0,2,3,0,0,3,2,2,2,2,2,2,2,3,3,3,3,2,1,1,1,1,1,1,3,2,0,1,1,2,0,0,2,2,1,1,0,2,0,3,0,1,2,1,0,3,2,2,2,3,3,3,2,1,2,0,1,1,2,0,3,0,2,1,1,0,1,3,1,1,0,0,2,2,1,3,2,2,2,2,3,3,3,2,1,0,3,3,2,2,2,1,2,2,0,2,2,3,2,1,2,3,0,2,2,1,0,3,2,2,2,2,1,2,0,1,3,2,3,0,1,2,3,0,2,1,1,2,3,0,0,3,0,1,3,2,2,0,3,0,2,0,1,1,2,0,0,2,3,2,3,0,2,3,1,1,1,2,0,3,0,3,2,3,2,2,0,0,2,2,2,0,2,1,0,3,2,3,0,0,2,2,2,2,3,3,3,0,3,0,2,0,2,1,2,2,0,2,2,0,1,1,2,0,0,2,3,2,2,3,3,2,2,2,2,2,1,1,1,3,1,2,0,2,2,3]
     >      # Increase each value by 1
      £     # Split the first (implicit) input-string into parts of that size
       I#   # Push the second input-string spit by spaces
         A  # Push the lowercase alphabet: "abcdefghijklmnopqrstuvwxyz"
          ‡ # Transliterate all morse-strings in the list by the characters in the alphabet
            # (after which the resulting character-list is output implicitly)

See this 05AB1E tip of mine (sections How to compress large integers? and How to compress integer lists?) to understand how the compression works.


Also, since I was curious what the program length would be if we wouldn't take any input and just use the 05AB1E dictionary to output the string, I created a program for that as well, which is 266 bytes:

’4ƒ€Þßer0–›r0prancer0vixen†„t0ï›id0¥Úner0bÙî؈€³€·4¾Î3‚¢£–2€‚€Ÿ13†¾5d2‚†a‚Òshiny5€ƒ€¬4ˆë–怕4€ÞƒÎ…耕é»s€Ÿ€‚3€¶2‚›€„ÇÇ0„Ò„šŒî€»†™†³—‹1…¯€†€Æ2ƒ‚‚½€µÚƒgyŒÎ»ˆš´‹é€„…è1€Ž€ž5€Ê®Á¢²t4„²€¯slŸ¯o†æ‚½€ß32®œ„š€œ€»sh€Äed€Ä€Žgï³13†¾5d24ll‚œ„‹€†„ª’5Ý’€ƒ rudolph reinÆÓ €€ €î ÄÃ’#:

Try it online.

Explanation:

’4ƒ€Þßer0–›r0prancer0vixen†„t0ï›id0¥Úner0bÙî؈€³€·4¾Î3‚¢£–2€‚€Ÿ13†¾5d2‚†a‚Òshiny5€ƒ€¬4ˆë–怕4€ÞƒÎ…耕é»s€Ÿ€‚3€¶2‚›€„ÇÇ0„Ò„šŒî€»†™†³—‹1…¯€†€Æ2ƒ‚‚½€µÚƒgyŒÎ»ˆš´‹é€„…è1€Ž€ž5€Ê®Á¢²t4„²€¯slŸ¯o†æ‚½€ß32®œ„š€œ€»sh€Äed€Ä€Žgï³13†¾5d24ll‚œ„‹€†„ª’
      # Push dictionary string "4knowdasher0dancer0prancer0vixencomet0cupid0donner0blitzenbutdo4recall3mostfamous2ofall13red5d2hadaveryshiny5andif4eversawit4wouldevensayitglowsallof3other2usedtolaugh0callhimnamestheyneverletpoor1joininany2gamesthenonefoggychristmasevesantacametosay1withyour5sobrightwont4guidemysleightonightthenhow32lovedhimastheyshoutedoutwithglee13red5d24llgodowninhistory"
 5Ý   # Push a list in the range [0,5]: [0,1,2,3,4,5]
   ’€ƒ rudolph reinÆÓ €€ €î ÄÃ’
      # Push dictionary string "and rudolph reindeer the you nose"
  #   # Split it on spaces: ["and","rudolph","reindeer","the","you","nose"]
   :  # Replace all [0,1,2,3,4,5] with ["and","rudolph","reindeer","the","you","nose"] in the string
      # (after which the result is output implicitly)

See the same 05AB1E tip as linked above (but section How to use the dictionary? this time), to understand how the dictionary strings work.

Kevin Cruijssen

Posted 2019-12-25T18:55:23.580

Reputation: 67 575

2Why not store the list in base 4 instead of base 5, and add 1 to every element after unpacking it? Surely that wouldn't add 19 bytes to the code, as is wasted by the data by using base 5? – Deadcode – 2020-01-04T00:10:11.313

1@Deadcode Thanks, that indeed saves quite a few bytes. – Kevin Cruijssen – 2020-01-04T12:53:38.540

Awesome that it only took one byte to do the incrementation by 1! I honestly didn't know, but hoped it would be one byte. – Deadcode – 2020-01-04T19:02:56.620

@Deadcode Yeah, 05AB1E has single-byte builtins for +1, -1, +2, and -2. And it automatically vectorizes on the inner most values of (nested) list. :) – Kevin Cruijssen – 2020-01-04T19:26:27.663

1I think you need to add a ,3 to the end of your list. history is being truncated to histor. – Deadcode – 2020-01-04T20:32:09.733

Do you think it would be possible to implement an algorithm like this in 05AB1E, with Jelly's dictionary (in list form) passed to it like a library? This Morse code problem sparked my interest and I'm thinking of learning 05AB1E or Jelly to take a crack at doing it in fewer bytes than the split-by-letter-length method can do. Would it run in better than geological time in 05AB1E? I ask because I have a feeling it might naturally be implemented in a rather breadth-first fashion, which would take far more resources than the depth-first way.

– Deadcode – 2020-01-04T20:43:33.697

@Deadcode Ah oops, thanks for noticing, fixed. As for your second comment, I'm not sure. I guess it would be possible to implement that algorithm in 05AB1E, but I'm not too good at C++ to modify it. Also, it seems to use Jelly's dictionary, but you could probably also access 05AB1E's library by using the Elixir compiler library, as apposed to the python3.7 it currently uses in that C++ program. As for your second question, I'm again not sure. The new 05AB1E's performance isn't too great to begin with. The legacy version which was written in Python was ok though. – Kevin Cruijssen – 2020-01-04T21:05:52.607

Ah, I was expecting 05AB1E to be faster than Jelly because Python is quite slow in my experience... is Elixir even slower? In any case, Jelly's 248,298 word dictionary is the perfect choice because it maps this entire song without any words needing to be added, whereas as you found, 05AB1E's 10,000 word dictionary is missing words like "prancer", "vixen", shiny", "rudolph", and many more. Here is my algorithm ported to Java.

– Deadcode – 2020-01-05T02:47:19.040

I think if you wanted to encode the message w/o input you could get ~166. Basically, split the morse code up into 7-bit blocks and prepend a 1. Then convert to base-255, and that's ~154 bytes. – Magic Octopus Urn – 2020-01-09T20:35:21.410

@MagicOctopusUrn I'm not sure I follow tbh. How is the morse code as 7-bit blocks, with 1s prepended to each, and then converted to base-255, useful to get the intended output with lowercase letters? As explained in the challenge description, -..---.-... could be xmas, dole, teettteteee, etc. So without any input, how would the compressed morse code alone be helpful in getting the letters output? Also, how did you get that 154/166 bytes? I'm getting around 174 bytes, but I'm probably doing something wrong. Could you show me your code as clarification? – Kevin Cruijssen – 2020-01-09T20:54:11.500

2k followed by è is just . 129 with •constant•4в>£I#A‡ and swapped inputs (TIO link too long for a comment). – Grimmy – 2020-01-17T17:12:32.063

@Grimmy Ah, I remember you gave me that tip before.. Thanks! – Kevin Cruijssen – 2020-01-17T19:32:33.760

2

JavaScript (Node.js),  270  269 bytes

Takes input as (string)(lookup), where lookup is in the 3rd format.

s=>t=>(B=Buffer)("SNB6>B5>V]@a5U)McAQ>@a/;3bRD7;2?R:1P_Sb7K@22MHO5V?O:AWZPATRV4A/VBW[0K:1PJ`Q?Fg=-ZD46TRRgC=]:1P-ZLYRgN<Z@9?H^RfCdRQPC6BTR1U;6>64UZ0MHc`=ZTSH2.cHRg4JQP<*VfRB]1b").map(n=>[0,2,4].map(i=>o+=B([97+t.split` `.indexOf(s.slice(p++,p+=n-40>>i&3))])),o=p='')&&o

Try it online!

How?

For each character in the data string, we take the ASCII code minus \$40\$ to get a 6-bit value holding the lengths of the next 3 morse codes.

Example:

"S" --> ASCII code 83 --> 43 --> 101011 --> 10 10 11
                                             |  |  |
                                             |  |  +--> 3+1 = 4 -> -.-- (y)
                                             |  +-----> 2+1 = 3 -> ---  (o)
                                             +--------> 2+1 = 3 -> ..-  (u)

Arnauld

Posted 2019-12-25T18:55:23.580

Reputation: 111 334

2

Python 3, 265 261 259 bytes

Uses the same base-256-in-a-string data-encoding method as my other Python answer, but actually uses the Morse code input. Like many of the other answers, encodes the lengths of the 474 Morse codewords into 119 bytes using base 4. The Morse stage of the decoding is based on steviestickman's answer, with some further golfing.

Takes the Morse library and representation as two lines from stdin and outputs to stdout:

#coding=L1
i=int.from_bytes('Õª¾¸%(¦#?ªÆ*\nìÉ^;R2´Ã9;IªÆnjñ¿ªÚÑc%&þ¬dÈZ    KU[þª¬8Ç2_Þ^" £#;Ú¸vLº«£+ÙJuî6x%(¦#?ª÷ ª\¤Ïr®L~XZ{MånXÖ9©«'.encode('L1'),'big')
I=open(0);m=I.readline().split()
while i:print(end=chr(97+m.index(I.read(i%4+1))));i//=4

(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Try it online! (can't use the actual coding header, since TIO only accepts UTF-8)
Try it on repl.it (demonstrates the actual 259 byte source code file)

-4 bytes by using the alias 437 for cp437
-2 bytes by using L1 instead of 437

Here's a Python program (TIO) to generate the full program with choice of string encoding.

Deadcode

Posted 2019-12-25T18:55:23.580

Reputation: 3 022

1

Python 3, 452 bytes

lambda a,b:''.join([chr(96+int(''.join([f'{ord(c):07b}'for c in'''ezVswP,A8BpeH	A82dDY
po4ZqVHDw,A8AMa+!ezd(a1J+/NCkuN"Rq'LJRdQd)>2I)8BK3!2ssIY>Rl,StiKu]zX!6iCJ4Go,1s(A/Qe)8BK3J`5\0\ a1DkA4YhA98[R_Uv BO%d\YHT\!%HpZ,t Wq&<sr&:eXYBuK#s*#lAS">Y|eMq$IhRw]&/TzR!-f6
IhQw:Q\CwQd*.RdcvDh3QsVBH}4];*H{"D$9yJ$E%b
,Y>V;d>wr	N'''])[i:i+5],2))for i in range(0,2346,5)])+'tory'

Try it online!

Same morse-ignoring, hardcoding approach as ScoopGracie's answer, but with each character's 5 LSB concatenated and re-encoded in printable ASCII (i.e. 7 free bits out of each byte).

This shortens the string from 474 to 339 bytes, not counting delimiters (which need to be multiline for the encoded string) and an escape sequence in place of a NUL byte. I also cut the string off at the last clean byte boundary (to not deal with partial bytes) so I had to hardcode the final four characters.

Then there's not-insignificant overhead for the decoding process, so it's not much shorter this way. Improvement suggestions welcome!

Using the standard library instead to compress the lyrics with base64.b64encode(gzip.compress(...)) actually saves 2 bytes:

Python 3, 450 bytes

lambda a,b:decompress(b64decode('H4sIAAaPGF4C/3WQ0XbsIAhFv5UJRG2VkyU41n79xdzJY19cBE62WxbGt2IyWZZOykx63MXVn+pdfkQPNPH4OMZVeOegek9ftfiv6Gs4Y2F0OahWz9JgflLDsC5FWaTjjEkfjHrlCHRhhQk/40xMb+nLctG1JwEvZyAlukazeNQTo3I01GgVTxXTAoozeNjMBzYC7Kg0Ut7SkcmlKTWxSC3dyCp+Af0j9IWiRUnXQ0ifsELlRErryL2YN7L42Uidjkg4QuSDCMO8N7DdDa9eUvYJ3dppFJa2rMpuQve52Rnz3sT/KyvewuFJt6RlDBeOY4NT3a/6a3dxRa0JjBmPyKGJvv4BRj4fyNoBAAA=')).decode('ascii')
from gzip import*
from base64 import*

Try it online!

Seb

Posted 2019-12-25T18:55:23.580

Reputation: 291

1

Python 3, 381 375 372 bytes

Just like Seb's answer and ScoopGracie's answer, gracefully takes the Morse code representation and any of the three libraries as input, blithely ignores them and gives you the output.

Takes Seb's strategy a step further, encoding the text in base 26 in an integer encoded in base 256 in a string (as opposed to Seb's which encodes in base 32 in base 128 in a string). Both of our solutions are confined by not being able to contain the bytes NUL (0x00) or CR (0x0D), which Python interprets as end-of-file and LF (0x0A), respectively, even when found in a triple-quoted raw string literal. CR+LF is also interpreted as LF. Backslash escape codes are interpreted (unless the raw prefix r is used). Backslash followed by any kind of newline is treated as nothing, stripping both the backslash and the newline. And the string can't end in a backslash. (So it's really sort of a slightly-less-than-base-254 masquerading with the help of luck as base 256.)

To work around this problem I had to complicate the encoding a bit. Neither ASCII-97 nor 122-ASCII, nor using base 27 instead of 26 (and optionally adding 1), avoided containing these bytes or sequences, so I settled upon adding 1 to the encoded integer before each multiplication by 26, which luckily costs only 3 bytes (compare i//=26 to i=i//26-1). With significantly larger text, the luck would have to be incredible, or every NUL, CR, and \ would have to be replaced with \0, \r, and \\ respectively. On the other hand, with better luck, newlines might be absent altogether, meaning single-quoting could be used instead of double-quoting.

The #coding=L1 costs 11 bytes (without this it'd be 361 bytes), but this beats the cost of using base 128 (which would take 319 bytes for the string, whereas base 256 takes 279 bytes, and thus would cost 40 bytes instead of 11, not counting the different Python code that would be needed to decode it). An alternative coding is 437, which would cost a further 2 bytes, since the encoding name is used twice. Note that the disadvantage of using L1 instead of 437 is that ISO 8859-1 doesn't map as cleanly to UTF-8, and there may be difficulty faithfully copying and/or pasting it in some software.

This standalone program, written in the ISO 8859-1 codepage, ignores the Morse code representation and library taken as command-line parameters, and prints the output to stdout:

#coding=L1
i=int.from_bytes(' îã¸ÌÒ^ë9|iéëMþ³\H0Äï;ÿÒ¸FȪΩ«|°M®QcWtgUî[¡U*¤ßºN¨øA¡:¾ç¢Àãþú/æòêåá\IÕi#vv³Ü¡\'jîÐtÄí6.ñÎ餥Yî³d}fLLEXµ*îB§ ç^eleW?Ð|¶S¯±[Iûçä2ý¥ð°,üT÷0èÅEÛçå\LW&uMeÑ@ûÍÿ`Ñ3CÒæ%2¿¬ÌïÆçGÌîXë[MDØmI¥íâý9t±¤]#ÏYî8M·Ck­ò¡"J'.encode('L1'),'big')
while i:print(end=chr(i%26+97));i=i//26-1

(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Try it online! (can't use the actual coding header, since TIO only accepts UTF-8)
Try it on repl.it (demonstrates the actual 372 byte source code file)

-6 bytes by using L1 instead of cp437
-3 bytes by using i=i//26-1 instead of i=i//26^4, resulting in no newlines, allowing single-quoting

Here's a Python program (TIO) to generate the full program with choice of string encoding.


If it absolutely has to be a function and not a program, then it's 382 bytes (not counting the #coding=L1):

def R(a,b):
    s='';i=int.from_bytes('…'.encode('L1'),'big')
    while i:s+=chr(i%26+97);i=i//26-1
    return s

Try it online!

If it has to be a lambda, 391 bytes (not counting the #coding=L1):

lambda a,b,i=int.from_bytes('…'.encode('L1'),'big'):exec("global s;s=''\nwhile i:s+=chr(i%26+97);i=i//26-1")or s - Try it online!

If it has to be a lambda that doesn't modify any globals, 395 bytes (not counting the #coding=L1):

lambda a,b,l={'i':int.from_bytes('…'.encode('L1'),'big')}:exec("s=''\nwhile i:s+=chr(i%26+97);i=i//26-1",l)or l['s'] - Try it online!

Deadcode

Posted 2019-12-25T18:55:23.580

Reputation: 3 022

1

Pyth + Jelly's dictionary, 135 134 131 130 129 (126) bytesSBCS

Kcwd=bmsm@KxGkdTVjC"êêVÞøA?cUqg´§¼F¦e~zEºÊú[!Áw¶uQå´çý½Ê*¤jrkIæwÐó*S.(4áëÌöÅ×¹ñp.ÑQÐgBê "51=>zl@bJ@sxRb_._<z24Np@TJ

(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Try it online! - 129 byte version, running at decent speed

Kcwd=bmsm@KxGkdTVjC"êêVÞøA?cUqg´§¼F¦e~zEºÊúvMÇìý÷¼]¾I/||ÊÁê,Ømø×i]Z8©b¤0åé'½bIý>ä
lGùUK½"51=>zl@bJ@sxRb_._zNp@TJ

(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Try it online! - 126 byte version, taking about 20 times as long to finish (especially slow at beginning)

Same algorithm and data as my APL answer. Like with it, I learned Pyth just for this.

Explanation of the code

Header

 $exec('

Anything in $...$ is evaled as Python code. Open an exec block here (since statements can't be executed in eval). The leading space suppresses imp_print.

def Pprint(a): print(a, end="", flush=True); return a

On my system, this program takes 46.4 (or 921) seconds to finish, but on TIO, it is on the very edge between not finishing in 60 seconds or just barely finishing. If it didn't manage to finish, no output would be shown. If it was manually interrupted before finishing, the incomplete output would be shown – so it appears TIO sends a SIGINT (Ctrl+C) to the process for this, but something more powerful (perhaps even SIGKILL) when reaching the 60 second limit.

By default, Python only flushes its standard output on newlines. The -u command line argument forces stdout (and stderr) to be unbuffered, but TIO does not provide a way to send arguments to the host Python process that spawns Pyth.

This header line redefines Pyth's Pprint() macro to flush its output, so that if the program does not finish within 60 seconds, the incomplete output will still be shown. It's the next best thing to -u.

Funnily enough, forcing each individual print to flush seems to actually make it run faster on TIO; it was able to finish in under 58 seconds with this, whereas in my tests on TIO the 134 byte version never finished without this.

_imp_print = imp_print;
def imp_print(a): globals()["imp_print"] = globals()["_imp_print "]; return a

Work around the newline inserted by TIO between the Header and Code sections, which would trigger an imp_print() on the first expression otherwise. Suppress only the very first imp_print that happens next, and then allow all subsequent ones.

')$

End the exec block.

=TrL0$exec('from jelly import dictionary') or dictionary.short + dictionary.long$

Import Jelly's dictionary and assign the concatenation of its short and long dictionaries, changed to all-lowercase, to the Pyth variable T. Equivalent to:

from jelly import dictionary
T = list(map(lambda arg:arg.lower(), dictionary.short + dictionary.long))

Code

Kcwd

The Morse code version of the song has already been read from input implicitly, as z = input().

Read the Morse code look-up table from input. Equivalent to K = input().split(' ').

=bmsm@KxGkdT

Translate Jelly's dictionary into Morse code, with the inter-letter gaps deleted. Equivalent to:
b = list(map(lambda d:''.join(map(lambda k:K[string.ascii_lowercase.index(k)], d)), T))

(The outer lambda uses d as its argument, whereas the inner lambda automatically advances to using k as its argument.)

jC"êêVÞøA?cUqg´§¼F¦e~zEºÊú[!Áw¶uQå´çý½Ê*¤jrkIæwÐó*S.(4áëÌöÅ×¹ñp.ÑQÐgBê "51
jC"êêVÞøA?cUqg´§¼F¦e~zEºÊúvMÇìý÷¼]¾I/||ÊÁê,Ømø×i]Z8©b¤0åé'½bIý>ä␊lGùUK½"51
(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Unpack the word choice array (115 elements), packed in base 51 in an 82 byte base 256 encoded string. Luckily there happened to be no characters that needed escaping.

C"..." - convert the base 256 encoded string into an arbitrary-precision integer
j...51 - convert the base 51 integer into a list of integers


V...=>zl@bJ@sxRb_._<z24Np@TJ
V...=>zl@bJ@sxRb_._zNp@TJ

Main loop.

V... - for N in...: - for each element N of the list decoded above, execute all of the following statements:

_._<z24 - reverse(list(map(lambda arg1:z[:arg1], range(1, 24+1)))) - make a list of Morse subsequences starting at the beginning of input (z), ranging from 24 to 0 dots/dashes long (from longest to shortest)
_._z - reverse(list(map(lambda arg1:z[:arg1], range(1, len(z))))) - make a list of Morse subsequences starting at the beginning of input (z), ranging from maximum length to 0 dots/dashes long (from longest to shortest)

J@sxRb...N - J = sum(map(lambda arg2:index(arg2,b),...), [])[N], where index(arg2,b) finds all occurrences of arg2 in b - enumerate all words that match the Morse code at the current position in the input, up to 24 dots+dashes, and store this list of dictionary indices (which apply to both b and T) in J

=>zl@bJ - z = z[len(b[J]):] - remove the processed Morse code from the beginning of the input

p@TJ - print(T[J]) - look up this word in the dictionary and print it

Deadcode

Posted 2019-12-25T18:55:23.580

Reputation: 3 022

1

Pyth, 142 bytesSBCS

Like many of the other answers, and my Python answer, encodes the lengths of the 474 Morse codewords into 119 bytes using base 4. Shares many similarities with my Pyth + Jelly's dictionary answer.

My primary reason for posting this answer is to show that in Pyth, apparently the 474-codeword-lengths method does not beat the 115-Jelly-dictionary-word-indices method!

Takes the Morse representation and library as two lines from stdin and outputs to stdout:

Kcwdsm@GxK<~>zddhMjC"¦¦Éir[%·AnÙjR[Ó+¨ß1£Z¡`­ú¯Ì¢Ù˵Ú~¦NªãÒê~Ì¡`¨k[UHÍ2ê«þU^
R1«ùu
z¯äú+):¦ÆÉl1èÈX.ËV3º
;
¯Ì¢ëê¥v+"4

(contains C1 control characters, forbidden by HTML5, thus the above can't be copied to clipboard and is for illustration purposes only)
Try it online!

Explanation of the code

Kcwd

The Morse code version of the song has already been read from input implicitly, as z = input().

Read the Morse code look-up table from input. Equivalent to K = input().split(' ').

hMjC"¦¦Éir[%·AnÙjR[Ó+¨ß1£Z¡`­ú¯Ì¢Ù˵Ú~¦NªãÒê~Ì¡`¨k[UHÍ2ê«þU^
R1«ùu
z¯äú+):¦ÆÉl1èÈX.ËV3º
;
¯Ì¢ëê¥v+"4

Unpack the array that stores the Morse code width (in dots+dashes) of each letter of the song (474 elements), packed in base 4 (with 1 subtracted from each element) in an 119 byte base 256 encoded string. Luckily there happened to be no characters that needed escaping.

C"..." - convert the base 256 encoded string into an arbitrary-precision integer
j...4 - convert the base 4 integer into a list of integers
hM - add 1 to all elements

sm@GxK<~>zdd

m - map each element d of the list decoded above through the following lambda:

~>zd - z = z[d:] - remove the first d dots+dashes from the input (z), and return the previous value of z (let's call it ʐ)

@GxK<ʐd - string.ascii_lowercase[K.index(ʐ[:d])] - alphabetical letter decoded from the first d dots+dashes from the input

s - join the resulting list of characters into a string

Deadcode

Posted 2019-12-25T18:55:23.580

Reputation: 3 022