Basic Latin character name to character

23

1

Let's get back to basics!

  • Your code, a complete program or function, must convert the official Unicode name of a printable Basic Latin character into the corresponding character. For example, for the input LOW LINE your code must output _.
  • You only need to take a single character name as input.
  • You cannot use any preexisting function or library, built-in or otherwise, which offers any logic relating specifically to Unicode character names (e.g. Python's unicodedata, Java's Character.getName, and so on.)
  • For input other than one of these names, any behavior is acceptable.

This is code golf: shortest code in bytes wins.

To avoid any ambiguity, this is the full set of official character names we'll be using (borrowed from this question):

     SPACE
!    EXCLAMATION MARK
"    QUOTATION MARK
#    NUMBER SIGN
$    DOLLAR SIGN
%    PERCENT SIGN
&    AMPERSAND
'    APOSTROPHE
(    LEFT PARENTHESIS
)    RIGHT PARENTHESIS
*    ASTERISK
+    PLUS SIGN
,    COMMA
-    HYPHEN-MINUS
.    FULL STOP
/    SOLIDUS
0    DIGIT ZERO
1    DIGIT ONE
2    DIGIT TWO
3    DIGIT THREE
4    DIGIT FOUR
5    DIGIT FIVE
6    DIGIT SIX
7    DIGIT SEVEN
8    DIGIT EIGHT
9    DIGIT NINE
:    COLON
;    SEMICOLON
<    LESS-THAN SIGN
=    EQUALS SIGN
>    GREATER-THAN SIGN
?    QUESTION MARK
@    COMMERCIAL AT
A    LATIN CAPITAL LETTER A
B    LATIN CAPITAL LETTER B
C    LATIN CAPITAL LETTER C
D    LATIN CAPITAL LETTER D
E    LATIN CAPITAL LETTER E
F    LATIN CAPITAL LETTER F
G    LATIN CAPITAL LETTER G
H    LATIN CAPITAL LETTER H
I    LATIN CAPITAL LETTER I
J    LATIN CAPITAL LETTER J
K    LATIN CAPITAL LETTER K
L    LATIN CAPITAL LETTER L
M    LATIN CAPITAL LETTER M
N    LATIN CAPITAL LETTER N
O    LATIN CAPITAL LETTER O
P    LATIN CAPITAL LETTER P
Q    LATIN CAPITAL LETTER Q
R    LATIN CAPITAL LETTER R
S    LATIN CAPITAL LETTER S
T    LATIN CAPITAL LETTER T
U    LATIN CAPITAL LETTER U
V    LATIN CAPITAL LETTER V
W    LATIN CAPITAL LETTER W
X    LATIN CAPITAL LETTER X
Y    LATIN CAPITAL LETTER Y
Z    LATIN CAPITAL LETTER Z
[    LEFT SQUARE BRACKET
\    REVERSE SOLIDUS
]    RIGHT SQUARE BRACKET
^    CIRCUMFLEX ACCENT
_    LOW LINE
`    GRAVE ACCENT
a    LATIN SMALL LETTER A
b    LATIN SMALL LETTER B
c    LATIN SMALL LETTER C
d    LATIN SMALL LETTER D
e    LATIN SMALL LETTER E
f    LATIN SMALL LETTER F
g    LATIN SMALL LETTER G
h    LATIN SMALL LETTER H
i    LATIN SMALL LETTER I
j    LATIN SMALL LETTER J
k    LATIN SMALL LETTER K
l    LATIN SMALL LETTER L
m    LATIN SMALL LETTER M
n    LATIN SMALL LETTER N
o    LATIN SMALL LETTER O
p    LATIN SMALL LETTER P
q    LATIN SMALL LETTER Q
r    LATIN SMALL LETTER R
s    LATIN SMALL LETTER S
t    LATIN SMALL LETTER T
u    LATIN SMALL LETTER U
v    LATIN SMALL LETTER V
w    LATIN SMALL LETTER W
x    LATIN SMALL LETTER X
y    LATIN SMALL LETTER Y
z    LATIN SMALL LETTER Z
{    LEFT CURLY BRACKET
|    VERTICAL LINE
}    RIGHT CURLY BRACKET
~    TILDE

Luke

Posted 2015-12-08T16:07:49.733

Reputation: 5 091

2Does the program only need to handle one character name? For example, should COLON COLON output :: , or undefined behaviour? – Kevin W. – 2015-12-08T16:14:39.000

Edited to clarify. – Luke – 2015-12-08T16:22:54.267

Why is String.fromCharCode forbidden? – SuperJedi224 – 2015-12-08T16:25:34.680

Whoops, I misunderstood what that function does. – Luke – 2015-12-08T16:28:26.653

How must we handle invalid input, like CLON ? – edc65 – 2015-12-08T18:40:52.780

See the fourth bullet point. – Luke – 2015-12-08T18:41:39.933

Invalid Perl 5 (breaks rule #3): perl -lpe '$_=eval"\"\\N{$_}\""' = 20 + 2 = 22 – hobbs – 2016-02-05T06:46:07.513

Answers

25

IA-32 machine code, 161 160 122 bytes

Hexdump of the code:

33 c0 6b c0 59 0f b6 11 03 c2 b2 71 f6 f2 c1 e8
08 41 80 79 01 00 75 ea e8 39 00 00 00 08 2c 5e
4a bd a3 cd c5 90 09 46 04 06 14 40 3e 3d 5b 23
60 5e 3f 2d 31 32 29 25 2e 3c 7e 36 39 34 33 30
21 2f 26 7d 7c 2c 3b 7b 2a 37 5d 22 35 20 3a 28
5c 27 2b 38 5f 24 5a 3c 34 74 17 3c 1a 74 16 33
c9 86 c4 0f a3 0a 14 00 41 fe cc 75 f6 8a 44 02
0e c3 8a 01 c3 8a 01 04 20 c3

This code uses some hashing. By some brute-force search, I found that the following hash function can be applied to the bytes of the input string:

int x = 0;
while (s[1])
{
    x = (x * 89 + *s) % 113;
    ++s;
}

It multiplies x by 89, adds the next byte (ASCII-code), and takes a remainder modulo 113. It does this on all bytes of the input string except the last one, so e.g. LATIN CAPITAL LETTER A and LATIN CAPITAL LETTER X give the same hash code.

This hash function has no collisions, and the output is in the range 0...113 (actually, by luck, the range is even narrower: 3...108).

The hash values of all relevant strings don't fill that space completely, so I decided to use this to compress the hash table. I added a "skip" table (112 bits), which contains 0 if the corresponding place in the hash table is empty, and 1 otherwise. This table converts a hash value into a "compressed" index, which can be used to address a dense LUT.

The strings LATIN CAPITAL LETTER and LATIN SMALL LETTER give hash codes 52 and 26; they are handled separately. Here is a C code for that:

char find(const char* s)
{
    int hash = 0;
    while (s[1])
    {
        hash = (hash * 89 + *s) % 113;
        ++s;
    }

    if (hash == 52)
        return *s;
    if (hash == 26)
        return *s + 32;

    int result_idx = 0;
    int bit = 0;
    uint32_t skip[] = {0x4a5e2c08, 0xc5cda3bd, 0x04460990, 0x1406};
    do {
        if (skip[bit / 32] & (1 << bit % 32))
            ++result_idx;
        ++bit;
    } while (--hash);

    return "@>=[#`^?-12)%.<~69430!/&}|,;{*7]\"5 :(\\'+8_$"[result_idx];
}

The corresponding assembly language code (MS Visual Studio inline-assembly syntax):

_declspec(naked) char _fastcall find(char* s)
{
    _asm {
        xor eax, eax;
    mycalc:
        imul eax, eax, 89;
        movzx edx, [ecx];
        add eax, edx;
        mov dl, 113;
        div dl;
        shr eax, 8;
        inc ecx;
        cmp byte ptr [ecx + 1], 0;
        jne mycalc;

        call mycont;
        // skip table
        _asm _emit 0x08 _asm _emit 0x2c _asm _emit 0x5e _asm _emit 0x4a;
        _asm _emit 0xbd _asm _emit 0xa3 _asm _emit 0xcd _asm _emit 0xc5;
        _asm _emit 0x90 _asm _emit 0x09 _asm _emit 0x46 _asm _emit 0x04;
        _asm _emit 0x06 _asm _emit 0x14;
        // char table
        _asm _emit '@' _asm _emit '>' _asm _emit '=' _asm _emit '[';
        _asm _emit '#' _asm _emit '`' _asm _emit '^' _asm _emit '?';
        _asm _emit '-' _asm _emit '1' _asm _emit '2' _asm _emit ')';
        _asm _emit '%' _asm _emit '.' _asm _emit '<' _asm _emit '~';
        _asm _emit '6' _asm _emit '9' _asm _emit '4' _asm _emit '3';
        _asm _emit '0' _asm _emit '!' _asm _emit '/' _asm _emit '&';
        _asm _emit '}' _asm _emit '|' _asm _emit ',' _asm _emit ';';
        _asm _emit '{' _asm _emit '*' _asm _emit '7' _asm _emit ']';
        _asm _emit '"' _asm _emit '5' _asm _emit ' ' _asm _emit ':';
        _asm _emit '(' _asm _emit '\\' _asm _emit '\'' _asm _emit '+';
        _asm _emit '8' _asm _emit '_' _asm _emit '$';

    mycont:
        pop edx;
        cmp al, 52;
        je capital_letter;
        cmp al, 26;
        je small_letter;

        xor ecx, ecx;
        xchg al, ah;
    decode_hash_table:
        bt [edx], ecx;
        adc al, 0;
        inc ecx;
        dec ah;
        jnz decode_hash_table;

        mov al, [edx + eax + 14];
        ret;

    capital_letter:
        mov al, [ecx];
        ret;

    small_letter:
        mov al, [ecx];
        add al, 32;
        ret;
    }
}

Some noteworthy implementation details:

  • It uses a CALL instruction to get a pointer to the code, where the hard-coded table resides. In 64-bit mode, it could use the register rip instead.
  • It uses the BT instruction to access the skip table
  • It manages to do the work using only 3 registers eax, ecx, edx, which can be clobbered - so there is no need to save and restore registers
  • When decoding the hash table, it uses al and ah carefully, so that at the right place ah is decreased to 0, and the whole eax register can be used as a LUT index

anatolyg

Posted 2015-12-08T16:07:49.733

Reputation: 10 719

18

JavaScript ES6, 228 236 247 257 267 274 287

Note: 7 chars saved thx @ev3commander

Note 2: better than JAPT after 7 major edits,

n=>n<'L'?"XC!DO$MP&OS'SK*N--FU.ZE0TW2HR3OU4FI5IX6EI8NI9EM;LS=R->IA@MF^AV`MM,NE1EN7LO:".replace(/(..)./g,(c,s)=>~n.search(s)?n=c[2]:0)&&n:'~  / ;  |?"\\ ) }]_+ #% < ( {['[(n<'Q')*13+n.length-(n>'T')-4]||n[21]||n[19].toLowerCase()

Run the snippet to test

F=n=>
  n<'L'?"XC!DO$MP&OS'SK*N--FU.ZE0TW2HR3OU4FI5IX6EI8NI9EM;LS=R->IA@MF^AV`MM,NE1EN7LO:"
  .replace(/(..)./g,(c,s)=>~n.search(s)?n=c[2]:0)&&n:
  '~  / ;  |?"\\ ) }]_+ #% < ( {['[(n<'Q')*13+n.length-(n>'T')-4]
  ||n[21]||n[19].toLowerCase()

//TEST
console.log=x=>O.innerHTML+=x+'\n'
;[
['&','AMPERSAND'],
['\'','APOSTROPHE'],
['*','ASTERISK'],
['^','CIRCUMFLEX ACCENT'],
[':','COLON'],
[',','COMMA'],
['@','COMMERCIAL AT'],
['8','DIGIT EIGHT'],
['5','DIGIT FIVE'],
['4','DIGIT FOUR'],
['9','DIGIT NINE'],
['1','DIGIT ONE'],
['7','DIGIT SEVEN'],
['6','DIGIT SIX'],
['3','DIGIT THREE'],
['2','DIGIT TWO'],
['0','DIGIT ZERO'],
['$','DOLLAR SIGN'],
['=','EQUALS SIGN'],
['!','EXCLAMATION MARK'],
['.','FULL STOP'],
['`','GRAVE ACCENT'],
['>','GREATER-THAN SIGN'],
['-','HYPHEN-MINUS'],
['A','LATIN CAPITAL LETTER A'],
['B','LATIN CAPITAL LETTER B'],
['C','LATIN CAPITAL LETTER C'],
['D','LATIN CAPITAL LETTER D'],
['E','LATIN CAPITAL LETTER E'],
['F','LATIN CAPITAL LETTER F'],
['G','LATIN CAPITAL LETTER G'],
['H','LATIN CAPITAL LETTER H'],
['I','LATIN CAPITAL LETTER I'],
['J','LATIN CAPITAL LETTER J'],
['K','LATIN CAPITAL LETTER K'],
['L','LATIN CAPITAL LETTER L'],
['M','LATIN CAPITAL LETTER M'],
['N','LATIN CAPITAL LETTER N'],
['O','LATIN CAPITAL LETTER O'],
['P','LATIN CAPITAL LETTER P'],
['Q','LATIN CAPITAL LETTER Q'],
['R','LATIN CAPITAL LETTER R'],
['S','LATIN CAPITAL LETTER S'],
['T','LATIN CAPITAL LETTER T'],
['U','LATIN CAPITAL LETTER U'],
['V','LATIN CAPITAL LETTER V'],
['W','LATIN CAPITAL LETTER W'],
['X','LATIN CAPITAL LETTER X'],
['Y','LATIN CAPITAL LETTER Y'],
['Z','LATIN CAPITAL LETTER Z'],
['a','LATIN SMALL LETTER A'],
['b','LATIN SMALL LETTER B'],
['c','LATIN SMALL LETTER C'],
['d','LATIN SMALL LETTER D'],
['e','LATIN SMALL LETTER E'],
['f','LATIN SMALL LETTER F'],
['g','LATIN SMALL LETTER G'],
['h','LATIN SMALL LETTER H'],
['i','LATIN SMALL LETTER I'],
['j','LATIN SMALL LETTER J'],
['k','LATIN SMALL LETTER K'],
['l','LATIN SMALL LETTER L'],
['m','LATIN SMALL LETTER M'],
['n','LATIN SMALL LETTER N'],
['o','LATIN SMALL LETTER O'],
['p','LATIN SMALL LETTER P'],
['q','LATIN SMALL LETTER Q'],
['r','LATIN SMALL LETTER R'],
['s','LATIN SMALL LETTER S'],
['t','LATIN SMALL LETTER T'],
['u','LATIN SMALL LETTER U'],
['v','LATIN SMALL LETTER V'],
['w','LATIN SMALL LETTER W'],
['x','LATIN SMALL LETTER X'],
['y','LATIN SMALL LETTER Y'],
['z','LATIN SMALL LETTER Z'],
['{','LEFT CURLY BRACKET'],
['(','LEFT PARENTHESIS'],
['[','LEFT SQUARE BRACKET'],
['<','LESS-THAN SIGN'],
['_','LOW LINE'],
['#','NUMBER SIGN'],
['%','PERCENT SIGN'],
['+','PLUS SIGN'],
['?','QUESTION MARK'],
['"','QUOTATION MARK'],
['\\','REVERSE SOLIDUS'],
['}','RIGHT CURLY BRACKET'],
[')','RIGHT PARENTHESIS'],
[']','RIGHT SQUARE BRACKET'],
[';','SEMICOLON'],
['/','SOLIDUS'],
[' ','SPACE'],
['~','TILDE'],
['|','VERTICAL LINE'],
].forEach(t=>{
  var r=F(t[1]),ok=r==t[0]
  //if (!ok) // uncomment to see just errors
  console.log(r+' ('+t[0]+') '+t[1]+(ok?' OK':' ERROR'))
})
console.log('DONE')
<pre id=O></pre>

edc65

Posted 2015-12-08T16:07:49.733

Reputation: 31 086

5Just... how? Well done. – SuperJedi224 – 2015-12-08T19:57:25.377

Actually, besides the alphabet, there is no char starting with "LA" – ev3commander – 2015-12-08T20:36:40.323

@ev3commander yes, but here I manage LAT, RIG and LEF and 2 chars seems too few, having LEFT and LESS – edc65 – 2015-12-08T20:41:26.080

Ohh. I just skimmed and didn't see the RIG/LEF part. – ev3commander – 2015-12-08T20:45:02.810

@ev3commander on second thought you have a point! I can merge handling LESS and LEFT and save 4 bytes. Thx – edc65 – 2015-12-08T20:51:10.270

10

Japt, 230 bytes

V=U¯2;Ug21 ªU<'R©Ug19 v ªV¥"DI"©`ze¿twâ¿¿¿¿e¿i`u bUs6,8)/2ªUf"GN" ©"<>+=$#%"g`¤grp¤qºnupe`u bV /2 ªUf"T " ©"[]\{}()"g"QSUCAP"bUg6) ªUf" M" ©"!\"?"g"COE"bUg2) ªV¥"CO"©",:@"g"ANE"bUg4) ª" &'*-./\\;~^`_|"g`spaµp¿豢¿Èögrlove`u bV /2

Each ¿ represents an unprintable Unicode char. Try it online!

Ungolfed:

V=Us0,2;Ug21 ||U<'R&&Ug19 v ||V=="DI"&&"zeontwthfofisiseeini"u bUs6,8)/2||Uf"GN" &&"<>+=$#%"g"legrpleqdonupe"u bV /2 ||Uf"T " &&"[]\{}()"g"QSUCAP"bUg6) ||Uf" M" &&"!\"?"g"COE"bUg2) ||V=="CO"&&",:@"g"ANE"bUg4) ||" &'*-./\\;~^`_|"g"spamapashyfusoreseticigrlove"u bV /2

This was really fun. I've split up the character names into several large chunks:

0. Take first two letters

V=Us0,2; sets variable V to the first two letters of U, the input string. This will come in handy later.

1. Capital letters

This is the easiest: the capital letters are the only ones that have a character at position 21, which all happen to be the correct letter and case. Thus, Ug21 is sufficient.

2. Lowercase letters

Another fairly easy one; the only other name that has a character at position 19 is RIGHT SQUARE BRACKET, so we check if the name is comes before R with U<'R, then if it is (&&), we take the 19th char with Ug19 and cast it to lowercase with v.

3. Digits

These names all start with DI (and fortunately, none of the others), so if V=="DI", we can turn it into a digit. The first letters of some of the digits' names are the same, but the first two letters are sufficient. Combining these into one string, we get ZEONTWTHFOFISISEEINI. Now we can just take the index b of the first two chars in the digit's name with Us6,8)and divide by two.

4. SIGN

There are seven names that contain SIGN:

<    LESS-THAN SIGN
>    GREATER-THAN SIGN
+    PLUS SIGN
=    EQUALS SIGN
$    DOLLAR SIGN
#    NUMBER SIGN
%    PERCENT SIGN

First we check that it the name contains the word SIGN. It turns out GN is sufficient; Uf"GN" returns all instances of GN in the name, which is null if it contains 0 instances, and thus gets skipped.

Now, using the same technique as with the digits, we combine the first two letters into a string LEGRPLEQDONUPE, then take the index and divide by two. This results an a number from 0-6, which we can use to take the corresponding character from the string <>+=$#%.

5. MARK

There are three characters that contain MARK:

!    EXCLAMATION MARK
"    QUOTATION MARK
?    QUESTION MARK

Here we use the same technique as with SIGN.  M is enough to differentiate these three from the others. To translate to a symbol, this time checking one letter is enough: the character at position 2 is different for all three characters. This means we don't have to divide by two when choosing the correct character.

6. LEFT/RIGHT

This group contains the brackets and parentheses, []{}(). It would be really complicated to capture both LEFT and RIGHT, but fortunately, they all contain the string . We check this with the same technique as we did with SIGN. To translate to a symbol, as with MARK, checking one letter is enough; the character at position 6 is unique for all six.

7. CO

The rest of the chars are pretty unique, but not unique enough. Three of them start with CO: COMMA, COLON, and COMMERCIAL AT. We use exactly the same technique as we did with the brackets, choosing the proper symbol based on the character at position 4 (A, N, or E).

8. Everything else

By now, the first two characters are different for every name. We combine them all into one big string SPAMAPASHYFUSORESETICIGRLOVE and map each pair to its corresponding char in  &'*-./\;~^`_|.

9. Final steps

Each of the parts returns an empty string or null if it's not the correct one, so we can link them all from left to right with ||. The || operator returns the left argument if it's truthy, and the right argument otherwise. Japt also has implicit output, so whatever the result, it is automatically sent to the output box.

Questions, comments, and suggestions welcome!

ETHproductions

Posted 2015-12-08T16:07:49.733

Reputation: 47 880

Great answer and great explanation. But you forgot to mention the handling or MARK (!?") in the explanation – edc65 – 2015-12-08T21:08:29.040

@edc65 Whoops, thanks! I've added in a section on MARK chars. – ETHproductions – 2015-12-08T21:20:39.630

7spamapashyfusoreseticigrlove = Spam a pashy for so reset icy girl love ... +1 – AdmBorkBork – 2015-12-08T21:27:17.890

No, that's very much still golfed. – Blacklight Shining – 2015-12-08T22:44:40.037

3

Javascript, 501 499 469 465 451 430 bytes

a=prompt();c="5SACEgEARKeQARKbNIGNbDIGNcPIGN9AANDaAPHEgLSIShRSIS8AISK9PIGN5CMMAcHNUS9FTOP7SDUSaDERO9DONE9DTWObDREEaDOURaDIVE9DSIXbDVENbDGHTaDINE5CLON9SLONeLIGNbEIGNhGIGNdQARKdC ATjLKETfRDUSkRKEThCENT8LINEcGENTiLKETdVINEjRKET5TLDE".match(/.{5}/g).indexOf(a.length.toString(36)+a[0]+a.slice(-3));if(c>=33)c+=26;if(c>=65)c+=26;alert(a.length==20&&a[0]=="L"?a.slice(-1).toLowerCase():a.length>21?a.slice(-1):String.fromCharCode(32+c))

Explanation:

That long string is a compressed list. a.length.toString(36)+a[0]+a.slice(-3) determines how, if at all, the string will be represented in the list. Also, special logic for letters. (with strings, a[0] is a builtin shorthand for a.charAt(0), by the way)

SuperJedi224

Posted 2015-12-08T16:07:49.733

Reputation: 11 342

If you replaced _ with +, you could Base64-compress the list. – ETHproductions – 2015-12-08T20:56:55.603

@ETHproductions base64 makes things longer, not shorter. – Blacklight Shining – 2015-12-08T22:45:52.323

@ETHproductions Does Javascript have Base64? – SuperJedi224 – 2015-12-08T23:07:28.190

@SuperJedi224 Yes it does, but Blacklight is correct unless the base 64 is replacing a number that could have been expressed in a lower base, especially binary. – wedstrom – 2015-12-08T23:09:56.477

You can use btoa("abc") to compress text by 25% (as long as it is valid base-64 text, which it would be after replacing _ with -), then atob("compressed stuff") in your actual code. – ETHproductions – 2015-12-08T23:11:38.210

If you have a string containing only base64 symbols, you can use base64 decoding to obtain a string of chars in range 0..255 that will be shorter. It's difficult to manage here at SO because some chars (code 0..31 and 128...159) are not handled by SO editor – edc65 – 2015-12-08T23:12:12.973

3

Python 2, 237 bytes

Get the hash of the string and modulo divide it by 535. Subsequently convert it to a unicode character with that number. The position of the unicode character in a precompiled list of unicode characters is subsequently converted to the ascii character.

print chr(u"""ǶŀȎdȊÏöǖIhȏƜǓDZǠƣƚdžƩC+ĶÅĠěóƋŎªƱijůŰűŪūŬŭŶŷŸŹŲųŴŵžſƀƁźŻżŽƆƇƈŖÐŗǀǼǿǾǹǸǻǺȅȄȇȆȁȀȃȂǭǬǯǮǩǨǫǪǵǴǷNȌ~B""".index(unichr(hash(raw_input())%535))+32)

Willem

Posted 2015-12-08T16:07:49.733

Reputation: 1 528

1

PowerShell, 603 547 464 bytes

$a=-split$args
$b=switch -W($a[0]){
"LEFT"{switch -w($a[1]){"C*"{"{"}"P*"{"("}"S*"{"["}}}
"RI*"{switch -w($a[1]){"C*"{"}"}"P*"{")"}"S*"{"]"}}}
"LA*"{("$($a[3])".ToLower(),$a[3])[$a[1]-like"C*"]}
"DI*"{@{ONE=1;TWO=2;THREE=3;FOUR=4;FIVE=5;SIX=6;SEVEN=7;EIGHT=8;NINE=9;ZERO="0"}[$a[1]]}
"COMME*"{"@"}
"APO*"{"'"}
}
$c='COM,LES<GRA`GRE>QUE?QUO"COL:REV\LOW_EXC!EQU=DOL$AMP&AST*PER%PLU+SEM;SOL/SPA CIR^HYP-FUL.NUM#TIL~VER|'
($b,$c[$c.IndexOf($a[0][0..2]-join'')+3])[!$b]

(LineFeed counts the same one byte as ;, so I'll leave the breaks in for readability)

Edit 1 - Took many elements out of the switch statement and instead populated a hashtable for lookups.

Edit 2 - Oh yeah ... indexing into a string, that's the way to go ...

Essentially takes the input, splits it on spaces, and does a wildcard switch on the first word to filter out the goofy ones. Sets the result of that to $b. If $b doesn't exist, the string $c gets evaluated on the first three letters of the first word and outputs the character immediately following, otherwise we output $b.

Some tricks include the LATIN CAPITAL LETTER R which indexes into an array based on whether the second word is CAPITAL, and outputs the corresponding uppercase/lowercase letter. The other "trick" is for the DIGITs, by indexing into a hashtable. Note that it's not shorter to do the same index-into-a-string trick here (it's actually longer by one byte).

AdmBorkBork

Posted 2015-12-08T16:07:49.733

Reputation: 41 581

I'm beating you again. – SuperJedi224 – 2015-12-09T21:35:13.297

1

Javascript, 416 411 389 bytes

l=(E)=>{return E=E.replace(/LA.*N|BR.*T|SIGN|MARK| |TION/g,"").replace(/(.).*(.{3})/,"$1$2"),E.match("CER")?E[3]:E.match("SER")?E[3].toLowerCase():(a="SACE EAMA!QOTA\"NBER#DLAR$PENT%AAND&APHE'AISK*PLUS+CMMA,HNUS-FTOP.SDUS/CLON:SLON;LHAN<EALS=GHAN>QUES?CLAT@RDUS\\CENT^LINE_GENT`VINE|LSIS(RSIS)LARE[RARE]LRLY{RRLY}TLDE~DERO0DONE1DTWO2DREE3DOUR4DIVE5DSIX6DVEN7DGHT8DINE9",a[a.indexOf(E)+4])}

This is a more readable format(explanation coming later):

function l(k){
    k=k.replace(/LA.*N|BR.*T|SIGN|MARK| |TION/g,'').replace(/(.).*(.{3})/,'$1$2')
    if(k.match('CER')) return k[3];
    if(k.match('SER')) return k[3].toLowerCase();
    a="SACE EAMA!QOTA\"NBER#DLAR$PENT%AAND&APHE'AISK*PLUS+CMMA,HNUS-FTOP.SDUS/CLON:SLON;LHAN<EALS=GHAN>QUES?CLAT@RDUS\\CENT^LINE_GENT`VINE|LSIS(RSIS)LARE[RARE]LRLY{RRLY}TLDE~DERO0DONE1DTWO2DREE3DOUR4DIVE5DSIX6DVEN7DGHT8DINE9"
    return a[a.indexOf(k)+4];
}

Minus 5 bytes from combining key and value strings.

Explanation: The regular expressions on the first line reduce the inputs into unique 4 character keys. Note that uniqueness is only guaranteed for the specific set of names specified in the challenge, and duplicates would be very common for normal English! Even for this challenge, I had to remove common words like bracket and sign to get a unique set.

To return the character, I check to see if it's a latin character by check for the strings "SER" and "cer", and return the last character of the input, in lowercase for ser.

For everything else, I refer to a string that contains all of the 4 character keys, followed by the correct character. I then use indexof and substring character indices to pull and return the character.

Edit: Used more wildcards to reduce regex size, replaced substr with character indices and shaved off antoher twenty chars. Rule sticklers will note that this final update is posted after the challenge has ended, however I don't think it changed my ranking. This is just practice for a novice.

wedstrom

Posted 2015-12-08T16:07:49.733

Reputation: 111

1

Python 3, 148 bytes

lambda s:chr(83-b'gfhtg\32}urgx_}3qeo|e~cwu~S~q~I,vqG\34jc}d*9~~_L|p~~~~~JJy'[sum(b'  !" *1! "2;D$# ! # !!( '[ord(c)%25]-32for c in s[:-1])]+ord(s[-1]))

For your viewing convenience, I’ve replaced two non-printable bytes with the octal escape codes \32 and \34; undo this to get the 148 byte function.

I computed parts of this hash function with GPerf.

Anders Kaseorg

Posted 2015-12-08T16:07:49.733

Reputation: 29 242

0

Perl 6,  348  242 bytes

{
  /NI/??9!!chr 32+
  '0A40W00SV0M20LR0O20IJ0LH0WH0YS0H20ID0A50P10IH0F70K10HF0I30LL0JX0JF0HX0LU0LE0JF0AJ0IX0RK0M40XF0QR0PD15Z16016116216316416516616716816916A16B16C16D16E16F16G16H16I16J16K16L16M16N16O1140V313F0XS0FU0N712A12B12C12D12E12F12G12H12I12J12K12L12M12N12O12P12Q12R12S12T12U12V12W12X12Y12Z0ZA0PU11L0AA'
  .comb(3).map({:36($_)}).first(:k,[+] .ords)
} # 348

{chr 32+"\x95ǐǠŬšƉĘŗȌȴĎĽ\x96ŖŁöģěĈśŊčĂĹŔĸ¤ĦƱŮȃƿƍʶʷʸʹʺʻʼʽʾʿˀˁ˂˃˄˅ˆˇˈˉˊʠʡʢʣʤɝǚʅǥâĿʇʈʉʊʋʌʍʎʏʐʑʒʓʔʕʖʗʘʙʚʛɱɲɳɴɵțųɃ\x9b".ords.first: :k,[+] .ords.map(*%43)}
{
  chr 32+
  "\x95ǐǠŬšƉĘŗȌȴĎĽ\x96ŖŁöģěĈśŊčĂĹŔĸ¤ĦƱŮȃƿƍʶʷʸʹʺʻʼʽʾʿˀˁ˂˃˄˅ˆˇˈˉˊʠʡʢʣʤɝǚʅǥâĿʇʈʉʊʋʌʍʎʏʐʑʒʓʔʕʖʗʘʙʚʛɱɲɳɴɵțųɃ\x9b"
  .ords.first: :k,[+] .ords.map(*%43)
}

usage:

my &code = {...}

# testing
my $test = [~] (' '..'~')».uniname».&code;
my $comparison = [~] ' '..'~';
say $test eq $comparison; # True

say code 'HYPHEN-MINUS'; # -

Brad Gilbert b2gills

Posted 2015-12-08T16:07:49.733

Reputation: 12 713