Convert English to a number without built-ins or libraries

14

This challenge is similar to this other, however I made a restriction (see bold text below) that I think would made it much diffent and (I hope) fun either.

The Challenge

Write a program or a function in any programming language that takes as input the English name of a positive integer n not exceeding 100 and returns n as an integer.

Standard loopholes are forbidden and you cannot use any built-in function, external tool, or library that already does this job.

Shortest source code in bytes wins.

Test

Here all the input->output cases:

one              -> 1
two              -> 2
three            -> 3
four             -> 4
five             -> 5
six              -> 6
seven            -> 7
eight            -> 8
nine             -> 9
ten              -> 10
eleven           -> 11
twelve           -> 12
thirteen         -> 13
fourteen         -> 14
fifteen          -> 15
sixteen          -> 16
seventeen        -> 17
eighteen         -> 18
nineteen         -> 19
twenty           -> 20
twenty-one       -> 21
twenty-two       -> 22
twenty-three     -> 23
twenty-four      -> 24
twenty-five      -> 25
twenty-six       -> 26
twenty-seven     -> 27
twenty-eight     -> 28
twenty-nine      -> 29
thirty           -> 30
thirty-one       -> 31
thirty-two       -> 32
thirty-three     -> 33
thirty-four      -> 34
thirty-five      -> 35
thirty-six       -> 36
thirty-seven     -> 37
thirty-eight     -> 38
thirty-nine      -> 39
forty            -> 40
forty-one        -> 41
forty-two        -> 42
forty-three      -> 43
forty-four       -> 44
forty-five       -> 45
forty-six        -> 46
forty-seven      -> 47
forty-eight      -> 48
forty-nine       -> 49
fifty            -> 50
fifty-one        -> 51
fifty-two        -> 52
fifty-three      -> 53
fifty-four       -> 54
fifty-five       -> 55
fifty-six        -> 56
fifty-seven      -> 57
fifty-eight      -> 58
fifty-nine       -> 59
sixty            -> 60
sixty-one        -> 61
sixty-two        -> 62
sixty-three      -> 63
sixty-four       -> 64
sixty-five       -> 65
sixty-six        -> 66
sixty-seven      -> 67
sixty-eight      -> 68
sixty-nine       -> 69
seventy          -> 70
seventy-one      -> 71
seventy-two      -> 72
seventy-three    -> 73
seventy-four     -> 74
seventy-five     -> 75
seventy-six      -> 76
seventy-seven    -> 77
seventy-eight    -> 78
seventy-nine     -> 79
eighty           -> 80
eighty-one       -> 81
eighty-two       -> 82
eighty-three     -> 83
eighty-four      -> 84
eighty-five      -> 85
eighty-six       -> 86
eighty-seven     -> 87
eighty-eight     -> 88
eighty-nine      -> 89
ninety           -> 90
ninety-one       -> 91
ninety-two       -> 92
ninety-three     -> 93
ninety-four      -> 94
ninety-five      -> 95
ninety-six       -> 96
ninety-seven     -> 97
ninety-eight     -> 98
ninety-nine      -> 99
one hundred      -> 100

Bob

Posted 2016-02-10T14:32:48.093

Reputation: 957

1What about a built-in that does half the job, for example finding the unicode name of a codepoint. – Brad Gilbert b2gills – 2016-02-10T21:02:27.630

@BradGilbertb2gills No, it is not fine. – Bob – 2016-02-10T21:11:57.090

Answers

22

C, 160 bytes

g(char*s){char i=1,r=0,*p="k^[#>Pcx.yI<7CZpVgmH:o]sYK$2";for(;*s^'-'&&*s;r+=*s++|9);r=r%45+77;for(;*p!=r;p++,i++);return((*s^'-')?0:g(s+1))+(i<21?i:10*(i-18));}

Test it

int main ()
{
    char* w[] = {"", "one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen", "twenty", "twenty-one", "twenty-two", "twenty-three", "twenty-four", "twenty-five", "twenty-six", "twenty-seven", "twenty-eight", "twenty-nine", "thirty", "thirty-one", "thirty-two", "thirty-three", "thirty-four", "thirty-five", "thirty-six", "thirty-seven", "thirty-eight", "thirty-nine", "forty", "forty-one", "forty-two", "forty-three", "forty-four", "forty-five", "forty-six", "forty-seven", "forty-eight", "forty-nine", "fifty", "fifty-one", "fifty-two", "fifty-three", "fifty-four", "fifty-five", "fifty-six", "fifty-seven", "fifty-eight", "fifty-nine", "sixty", "sixty-one", "sixty-two", "sixty-three", "sixty-four", "sixty-five", "sixty-six", "sixty-seven", "sixty-eight", "sixty-nine", "seventy", "seventy-one", "seventy-two", "seventy-three", "seventy-four", "seventy-five", "seventy-six", "seventy-seven", "seventy-eight", "seventy-nine", "eighty", "eighty-one", "eighty-two", "eighty-three", "eighty-four", "eighty-five", "eighty-six", "eighty-seven", "eighty-eight", "eighty-nine", "ninety", "ninety-one", "ninety-two", "ninety-three", "ninety-four", "ninety-five", "ninety-six", "ninety-seven", "ninety-eight", "ninety-nine", "one hundred"};

    int n;
    for (n = 1; n <= 100; n++)
    {
        printf ("%s -> %d\n", w[n], g(w[n]));
        if (n != g(w[n]))
        {
            printf ("Error at n = %d", n);
            return 1;
        }
    }
    return 0;
}

How it works

After some attempts, I found a function that maps the "exceptional" numbers one, two, three, four, five, six, seven, eight, nine, ten, eleven, twelve, thirteen, fourteen, fifteen, sixteen, seventeen, eighteen, nineteen, twenty, thirty, forty, fifty, sixty, seventy, eighty, ninety, one hundred, to the printable ASCII characters k, ., [, <, *, , c, K, w, y, e, (, S, _, -, C, ), 7, =, 4, &, o, ], s, Y, g, m, N, respectively.

This function is:

char hash (char* s)
{
    char r = 0;

    while (*s)
    {
        r += *s|9;
        s++;
    }

    return r%45+77;
}

The golfed program computes the hash function of the input until it reaches the end of the string or the character -. Then it searches the hash in the string k.[<* cKwye(S_-C)7=4&o]sYgmN and determines the corresponding number. If the end of the input string was reached the number is returned, if instead a - was reached, then it is returned the number plus the result of the golfed program applied to the rest of the input string.

Bob

Posted 2016-02-10T14:32:48.093

Reputation: 957

I'm thinking, if there were a golfing version of C, it might actually beat languages like CJam Pyth Japt etc... – busukxuan – 2016-02-12T18:35:38.183

11

JavaScript (ES6), 175 166 163 156 153 147 bytes

Saved 7 bytes thanks to @Neil

a=>+a.replace(/.+te|.*el|y$/,x=>x[1]?'on-'+x:'-d').split(/ |-|dr/).map(x=>"un|d,on|le,w,th,fo,f,x,s,h,i,".split`,`.findIndex(y=>x.match(y))).join``

Verify it here:

f=a=>+a.replace(/.+te|.*el|y$/,x=>x[1]?'on-'+x:'-d').split(/ |-|dr/).map(x=>"un|d,on|le,w,th,fo,f,x,s,h,i,".split`,`.findIndex(y=>x.match(y))).join``;

document.body.innerHTML="<pre>"+["one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve","thirteen","fourteen","fifteen","sixteen","seventeen","eighteen","nineteen","twenty","twenty-one","twenty-two","twenty-three","twenty-four","twenty-five","twenty-six","twenty-seven","twenty-eight","twenty-nine","thirty","thirty-one","thirty-two","thirty-three","thirty-four","thirty-five","thirty-six","thirty-seven","thirty-eight","thirty-nine","forty","forty-one","forty-two","forty-three","forty-four","forty-five","forty-six","forty-seven","forty-eight","forty-nine","fifty","fifty-one","fifty-two","fifty-three","fifty-four","fifty-five","fifty-six","fifty-seven","fifty-eight","fifty-nine","sixty","sixty-one","sixty-two","sixty-three","sixty-four","sixty-five","sixty-six","sixty-seven","sixty-eight","sixty-nine","seventy","seventy-one","seventy-two","seventy-three","seventy-four","seventy-five","seventy-six","seventy-seven","seventy-eight","seventy-nine","eighty","eighty-one","eighty-two","eighty-three","eighty-four","eighty-five","eighty-six","eighty-seven","eighty-eight","eighty-nine","ninety","ninety-one","ninety-two","ninety-three","ninety-four","ninety-five","ninety-six","ninety-seven","ninety-eight","ninety-nine","one hundred"].map((x,y)=>'f("'+x+'") === '+f(x)+"? "+(f(x)===y+1)).join("<br>")+"</pre>"

How it works

The basic idea is to split each number into its digit-words, then map each word to the corresponding digit. Almost all of the words are set up to be matched properly with a simple regex, but there are a few anomalies:

  • eleven through nineteen: if the word contains an el, or a te in the middle (to avoid ten), we add a on- to the beginning, changing these to on-eleven through on-nineteen.
  • twenty, thirty, etc.: replacing a trailing y with -d changes these to twent-d, thirt-d, etc.

Now we split at hyphens, spaces, and drs. This splits everything from 11 to 99 into its corresponding digit-words, and "one hundred" into [one,hun,ed]. Then we map each of these words through an array of regexes, and keep the index of the one that matches first.

0: /un|d/ - This matches the "hun" and "ed" in 100, as well as the "d" we placed on the end of 20, 30, etc.
1: /on|le/ - Matches "one" and the "on" we placed on the beginning of 11 through 19, along with "eleven".
2: /w/ - Matches "two", "twelve", and "twenty".
3: /th/ - Matches "three" and "thirty".
4: /fo/ - Matches "four" and "forty".
5: /f/ - "five" and "fifty" are the only words by now that contain an "f".
6: /x/ - "six" and "sixty" are the only words that contain an "x".
7: /s/ - "seven" and "seventy" are the only words by now that contain an "s".
8: /h/ - "eight" and "eighty" are the only words by now that contain an "h".
9: /i/ - "nine" and "ninety" are the only words by now that contain an "i".
10: /<empty>/ - "ten" is the only word left, but it still has to be matched.

By now, each and every input will be the array of the proper digits. All we have to do is join them with join``, convert to a number with unary +, and we're done.

ETHproductions

Posted 2016-02-10T14:32:48.093

Reputation: 47 880

Please, explain. – Bob – 2016-02-10T16:03:39.453

@Bob Sure, explanation added. – ETHproductions – 2016-02-10T16:22:15.840

Doesn't .findIndex(y=>x.match(y)) work? – Neil – 2016-02-10T20:24:04.647

@Neil I didn't realize that would, but it does, thanks! – ETHproductions – 2016-02-10T20:45:09.757

I'm pretty sure you can alias replace. – Mama Fun Roll – 2016-02-11T02:05:39.730

@ӍѲꝆΛҐӍΛПҒЦꝆ Exactly the same length :( a=>+a[r="replace"](/(.+)te/,'on-$1')[r](/y$/,'-d')... – ETHproductions – 2016-02-11T15:57:53.407

@ӍѲꝆΛҐӍΛПҒЦꝆ Fortunately, combining the two replaces saved three bytes! – ETHproductions – 2016-02-11T16:04:32.757

6

sh + coreutils, 112 bytes

Can be run on all testcases at once, one per line.

sed -r "`awk '$0="s/"$0"/+"NR"/g"'<<<"on
tw
th
fo
fi
si
se
ei
ni
te|lv
el"`
s/ /y0/
s/y/*10/
s/^\+|[a-z-]//g"|bc

Explanation

The backticked awk evaluates to the sed script

s/on/+1/g       # one, one hundred
s/tw/+2/g       # two, twelve, twenty
s/th/+3/g       # three, thirteen, thirty
s/fo/+4/g       # ...
s/fi/+5/g
s/si/+6/g
s/se/+7/g
s/ei/+8/g
s/ni/+9/g
s/te|lv/+10/g   # ten, -teen, twelve
s/el/+11/g      # eleven

which transforms parts of numbers into their numeric representation.

fife            ->    +5ve
ten             ->    +10n
eleven          ->    +11even
twelve          ->    +2e+10e
sixteen         ->    +6x+10en
thirty-seven    ->    +3irty-+7ven
forty-four      ->    +4rty-+4ur
eighty          ->    +8ghty
one hundred     ->    +1e hundred

The additional lines of the sed script

s/ /y0/
s/y/*10/

take care of -tys and one hundred.

+3irty-+7ven    ->    +3irt*10-+7ven
+4rty-+4ur      ->    +4rt*10-+4ur
+8ghty          ->    +8ght*10
+1e hundred     ->    +1ey0hundred      ->    +1e*100hundred

Finally, remove leading +s and everything that is not +, * or a digit.

s/^\+|[a-z-]//g"

Only math expressions remain

fife            ->    5
sixteen         ->    6+10
forty-four      ->    4*10+4
eighty          ->    8*10
one hundred     ->    1*100

and can be piped into bc.

Rainer P.

Posted 2016-02-10T14:32:48.093

Reputation: 2 457

4

Pyth, 79 76 75 68 bytes

Thank @ETHproductions for 7 bytes.

?}"hu"z100sm*+hxc."ewEСBu­["2<d2?|}"een"d}"lv"dTZ?}"ty"dT1cz\-

Basically first checks the corner case of 100, then uses an array of the first two letters of the numbers 0 to 11 to determine the semantics of the input and modify the value according to suffix ("-ty" and "-teen"; "lv" in 12 is another corner case). First splits input into a list of words, then map each one to a value, and sum them up.

In pythonic pseudocode:

                           z = input()    # raw, unevaluated
                           Z = 0
                           T = 10
?}"hu"z                    if "hu" in z:  # checks if input is 100
  100                        print(100)
                           else:
sm                           sum(map( lambda d: # evaluates each word, then sum
  *                            multiply(
   +hxc."ewEСBu­["2<d2           plusOne(chop("ontwth...niteel",2).index(d[:2])) + \
                                 # chops string into ["on","tw",..."el"]
                                 # ."ewEСBu­[" is a packed string
     ?|}"een"d}"lv"dTZ               (T if "een" in d or "lv" in d else Z),
                                     # add 10 for numbers from 12 to 19
   ?}"ty"dT1                     T if "ty" in d else 1),  # times 10 if "-ty"
  cz\-                         z.split("-"))  # splits input into words

Test suite


Python 3, 218 bytes

z=input()
if "hu" in z:print(100);exit()
print(sum(map(lambda d:([0,"on","tw","th","fo","fi","si","se","ei","ni","te","el"].index(d[:2])+(10 if "een" in d or "lv" in d else 0))*(10 if "ty" in d else 1),z.split("-"))))

Basically identical to the Pyth answer.


Off-topic:

I just discovered a meaningful version of the answer to life, the universe and everything: it's tea-thirsty twigs. Wow, twigs that yearn for tea! I'm not sure how many other answers do this, but for my answer if the input is "tea-thirsty-twigs" the output is 42.

busukxuan

Posted 2016-02-10T14:32:48.093

Reputation: 2 728

I believe you can save seven bytes by using a packed string. Copy the output and put it in place of "ontwthfofisiseeiniteel" in this program.

– ETHproductions – 2016-02-11T16:07:01.913

@ETHproductions Wow, thanks! The last time I checked, there was still "ze" at the head of the string, and packing couldn't work. I didn't check once more after I golfed it out. Again, thanks xD – busukxuan – 2016-02-11T17:04:12.573

@ETHproductions yes I actually did, it's under the pseudocode. – busukxuan – 2016-02-12T18:24:57.017

2

Haskell, 252 231 bytes

let l=words;k=l"six seven eight nine";w=l"one two three four five"++k++l"ten eleven twelve"++((++"teen")<$>l"thir four fif"++k)++[n++"ty"++s|n<-l"twen thir for fif"++k,s<-"":['-':x|x<-take 9w]]in maybe 100id.flip lookup(zip w[1..])

This creates a list of all English number names from "one" to "ninety-nine" and then looks the index of the input up. If it doesn't exist, we're in the edge case "one hundred", so it returns 100, otherwise it's going to return the index.

Ungolfed

-- k in the golfed variant
common = words "six seven eight nine" 

-- w in the golfed variant
numbers = words "one two three four five" ++ common
       ++ words "ten eleven twelve" ++ [p ++ "teen" | p <- words "thir four fif" ++ common]
       ++ [p ++ "ty" ++ s| p <- words "twen thir for fif" ++ common
                         , s <- "" : map ('-':) (take 9 numbers)]

-- part of the expression in the golfed variant
convert :: String -> Int
convert s = maybe 100 id $ lookup s $ zip numbers [1..]

Zeta

Posted 2016-02-10T14:32:48.093

Reputation: 681

2

Python 3, 365 361 310 303 characters

Golfed

def f(a):
 y=0
 for i in a.split("-"):
  x="one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thir;four;fif;six;seven;eigh;nine;twenty,thirty,forty,fifty,sixty,seventy,eighty,ninety,one hundred".replace(";","teen,").split(",").index(i)
  y+=x+1 if x<20 else range(30,110,10)[x-20]
 return y

Ungolfed

 def nameToNumber (numberName):
    names = ["one","two","three","four","five","six","seven","eight","nine","ten","eleven","twelve","thirteen",
             "fourteen","fifteen","sixteen","seventeen","eighteen","nineteen","twenty","thirty","forty","fifty",
             "sixty","seventy","eighty","ninety","one hundred"]
    numbers = range(30, 110, 10)
    number = 0
    for n in numberName.split("-"):
        x = names.index(n)
        number += x + 1 if x < 20 else numbers[x - 20]
    return number

Argenis García

Posted 2016-02-10T14:32:48.093

Reputation: 223

45 characters shorter: n="one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen,fourteen,fifteen,sixteen,seventeen,eighteen,nineteen,twenty,thirty,forty,fifty,sixty,seventy,eighty,ninety,one hundred".split(",") But as I see, should work without assigning it to variable n, just call .index() directly on it. – manatwork – 2016-02-10T16:09:31.023

7 characters shorter: "one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thir;four;fif;six;seven;eigh;nine;twenty,thirty,forty,fifty,sixty,seventy,eighty,ninety,one hundred".replace(";","teen,").split(","). – manatwork – 2016-02-10T16:36:48.540

The StackExchange site engine has an irritating habit: it inserts invisible characters (U200C Zero Width Non-Joiner and U200B Zero Width Space) into the code posted in comments. You copy-pasted them too. I edited your post to remove them. – manatwork – 2016-02-10T17:13:25.933

2

Python 2, 275 characters

def x(n):a='one two three four five six seven eight nine ten eleven twelve'.split();t='twen thir four fif six seven eigh nine'.split();b=[i+'teen'for i in t[1:]];c=[i+'ty'for i in t];return(a+b+[i+j for i in c for j in ['']+['-'+k for k in a[:9]]]+['one hundred']).index(n)+1

It simple builds a list of every number and finds the index.

Peter

Posted 2016-02-10T14:32:48.093

Reputation: 225

1

Japt, 82 bytes

+Ur`(.+)¿``¿-$1` r"y$""-d" q$/ |-|dr/$ £`un|Üaiwo|ØÏ¿ifoifix¿iÊ¿¿e¿iv`qi b_XfZ}Ãq

Each ¿ represents an unprintable char. Test it online!

Based on my JS answer. Subtract one byte if the output doesn't need to be an integer, since it would appear exactly the same as a string.

How it works

+Ur`(.+)¿` `¿-$1`  r"y$""-d" q/ |-|dr/ £  `un|Üaiwo|ØÏ¿ifoifix¿iÊ¿¿e¿iv`          qi b_ XfZ}à q
+Ur"(.+)te""on-$1" r"y$""-d" q/ |-|dr/ mX{"un|dioniwo|wenithifoifixisihineiteiniv"qi bZ{XfZ}} q

Ur"(.+)te""on-$1" // Replace "thirteen", "fourteen", etc. with "on-thiren", "on-fouren", etc.
r"y$""-d"         // Replace "twenty", "thirty", etc. with "twent-d", "thirt-d", etc.
q/ |-|dr/         // Split at occurances of a space, hyphen, or "dr". By now,
                  // "one", "thirteen", "twenty", "sixty-six", "one hundred" will have become:
                  // "one", "on" "thiren", "twent" "d", "sixty" "six", "one" "hun" "ed"
mX         }      // Map each item X in the resulting array to:
"..."qi           //  Take this string, split at "i"s,
b_XfZ}            //  and find the first item Z where X.match(RegExp(Z)) is not null.
                  //  See my JS answer to learn exactly how this works.
                  // Our previous example is now
                  // "1", "1" "3", "2" "0", "6" "6", "1" "0" "0"
+              q  // Join and convert to integer.
                  // 1, 13, 20, 66, 100

ETHproductions

Posted 2016-02-10T14:32:48.093

Reputation: 47 880

1

JavaScript, 214 199 bytes

As always: turns out this is too long to compete, but now that I'm done it'd be a waste not to post this.

Perhaps there's an obvious way to golf this further that I've overlooked?

e=s=>s.slice(-1)=='d'?100:'  ontwthfofisiseeinite'.indexOf(s.slice(0,2))/2;f=s=>([t,u]=s.split('-'),~s.indexOf`le`?11:~s.indexOf`lv`?12:e(t)+(t.slice(-3)=='een')*10+''+(u?e(u):t.slice(-1)=='y'?0:''))

JSFiddle for test cases

vvye

Posted 2016-02-10T14:32:48.093

Reputation: 261

1How about changing f to f=s=>([t,u]=s.split('-'),~s.indexOf('le')?11:~s.indexOf('lv')?12:e(t)+(t.slice(-3)=='een')*10+''+(u?e(u):t.slice(-1)=='y'?0:''))? Also, a single string argument can be passed to a function like so: s.indexOf`lv` – ETHproductions – 2016-02-10T21:19:20.137

@ETHproductions That's great, thanks! I didn't know JS had a comma operator, and the shorthand for string passing is really useful as well. – vvye – 2016-02-10T22:37:46.573

1

Perl, 158 bytes

@s=split/(\d+)/,'te1ten0l1le1on1tw2th3fo4fi5si6se7ei8ni9d00';foreach(split'-',$n=$ARGV[0]){for($i=0;$i<$#s;$i+=2){m/$s[$i]/&&print$s[$i+1]}}$n=~/ty$/&&print 0

Runs from the command line. one hundred must be entered as "one hundred" to stop it being interpreted as two inputs.

CJ Dennis

Posted 2016-02-10T14:32:48.093

Reputation: 4 104