Convert a string of digits from words to an integer

20

1

Convert a string containing digits as words into an integer, ignoring leading zeros.

Examples

  • "four two" -> 42.
  • "zero zero zero one" -> 1.

Assumptions

Submissions can assume that:

  1. The input string is comprised of space-separated digit words.
  2. All words are valid (in the range "zero".."nine") and lowercase. Behaviour for empty input is undefined.
  3. The input string always represents an unsigned number within the range of int and is never an empty string.

Scoring

Answers will be scored in bytes with fewer bytes being better.

Sparkler

Posted 2019-09-22T21:03:50.647

Reputation: 321

3

Welcome to the site. There are a couple of things that we usually expect from questions that are missing here. The most important would be an objective scoring criterion which all challenges must have.

– Post Rock Garf Hunter – 2019-09-22T21:06:55.950

3Aside from that this question is very sparse on specification. You should specify exactly what is required of submissions without ambiguity. One sentence and an example just isn't up to our clarity standards for challenges. – Post Rock Garf Hunter – 2019-09-22T21:08:35.033

@SriotchilismO'Zaic the code-golf tag is enough? – Sparkler – 2019-09-22T21:11:25.873

3

On top of what has been said already, we have a sandbox where users can post their challenges before posting them to main. That way you will miss less information when making posts. If you look at other recent posts on the site with a reasonably positive reception I think you will see that both your question and solution aren't quite in line with what we do here.

– FryAmTheEggman – 2019-09-22T21:13:19.043

1To handle the scoring criterion it would be the absolute minimum. You really should put a sentence in the body of the question though. E.g. "Answers will be scored in bytes with fewer bytes being better". There is technically more than one way to score [tag:code-golf], but bytes is overwhelmingly the most popular. – Post Rock Garf Hunter – 2019-09-22T21:13:50.713

1What behavior is expected on leading zeros for example should zero zero one output 001, 1, or are both allowed? Also what should be done with the empty string? Does it count as 0 or is it not a valid input? – Post Rock Garf Hunter – 2019-09-22T21:17:33.383

3At the risk of being pedantic, I'd like to point out that the range "zero".."nine" is not fully specified. – Unrelated String – 2019-09-22T21:22:22.167

4Annoyingly, the builtin Interpreter@"SemanticNumber" does exactly this in Mathematica—except that it fails on strings starting with zero zero. – Greg Martin – 2019-09-23T05:45:28.277

2I have troubles parsing the title. Shouldn't it be written in the usual way, "I ... the source code, you .. the input!" ? – Eric Duminil – 2019-09-23T08:09:38.907

2What's unclear here? Why the VTCs? The only potential issue I see is the one raised by @UnrelatedString above. – Shaggy – 2019-09-23T11:40:05.633

1Can we assume the input is all uppercase, rather than all lowercase? – Grimmy – 2019-09-23T12:49:46.367

2

Possible duplicate of Convert English to a number without built-ins or libraries

– Olivier Grégoire – 2019-09-23T13:06:55.043

3@OlivierGrégoire it's a different challenge to that one - this one is only the digits rather than the ordinal number in English (i.e. "four two" here "fourty two" there) this makes quite a difference. – Jonathan Allan – 2019-09-23T13:51:02.530

Answers

22

PHP, 74 bytes

foreach(explode(' ',$argn)as$w)$n.='793251_8640'[crc32($w)%20%11];echo+$n;

Try it online!

Tried to get a solution which doesn't copy existing answers. I get cyclic redundancy checksum polynomial of 32-bit lengths (crc32) for each word and then do a mod 20 and mod 11 on it to get mixed up unique values from 0 to 10 (missing 6) for each digit. Then using that unique value I find the actual digit.

| Word  | CRC32      | %20 | %11 | Equivalent digit |
|-------|------------|-----|-----|------------------|
| zero  | 2883514770 | 10  | 10  | 0                |
| one   | 2053932785 | 5   | 5   | 1                |
| two   | 298486374  | 14  | 3   | 2                |
| three | 1187371253 | 13  | 2   | 3                |
| four  | 2428593789 | 9   | 9   | 4                |
| five  | 1018350795 | 15  | 4   | 5                |
| six   | 1125590779 | 19  | 8   | 6                |
| seven | 2522131820 | 0   | 0   | 7                |
| eight | 1711947398 | 18  | 7   | 8                |
| nine  | 2065529981 | 1   | 1   | 9                |

Another 74 bytes CRC32 alternative using %493%10: Try it online!

Another 74 bytes CRC32 alternative using %2326%11: Try it online!


PHP, 74 bytes

foreach(explode(' ',$argn)as$w)$n.=strpos(d07bfe386c,md5($w)[21]);echo+$n;

Try it online!

Another alternative with same length, takes 22nd character in the md5 of the word (only character which gives a unique value for each word) and then uses that character to map to a digit.

Night2

Posted 2019-09-22T21:03:50.647

Reputation: 5 484

This is a cool answer – Juan Sebastian Lozano – 2019-09-23T16:08:47.513

9

Python 2,  71  70 bytes

-1 thanks to ovs (use find in place of index)

lambda s:int(''.join(`'rothuvsein'.find((w*3)[6])`for w in s.split()))

Try it online!

Jonathan Allan

Posted 2019-09-22T21:03:50.647

Reputation: 67 804

7

JavaScript (ES6),  70 67 66  62 bytes

Saved 3 bytes thanks to @ovs

s=>+s.replace(/\w+ ?/g,s=>'2839016547'[parseInt(s,36)%204%13])

Try it online!

Arnauld

Posted 2019-09-22T21:03:50.647

Reputation: 111 334

1'2839016547'[parseInt(s,36)%204%13] is 3 bytes shorter. – ovs – 2019-09-22T22:06:25.407

6

Jelly,  19  17 bytes

Ḳµ7ị“*;nÄƲ]³Ṙ»i)Ḍ

A monadic Link accepting a list of characters which yields an integer.

Try it online!

Pretty much a port of my Python 2 answer.


Previous

ḲŒ¿€i@€“©¥q£½¤MÆÑ‘Ḍ

Try it online!

There is quite possibly a shorter way, but this is a way that first came to mind.

Jonathan Allan

Posted 2019-09-22T21:03:50.647

Reputation: 67 804

Removing zero from the enklact string to avoid decrementing, because not found is zero anyhow... clever! – Unrelated String – 2019-09-22T22:01:46.983

1Ah I see you did the same method, nice. – Jonathan Allan – 2019-09-22T22:16:36.473

5

Python 3, 107, 91, 77, 90 bytes

-16 bytes by Sriotchilism O'Zaic

+13 bytes to remove leading zeroes

lambda s:int(''.join(map(lambda w:str('zeontwthfofisiseeini'.index(w[:2])//2),s.split())))

Try it online!

movatica

Posted 2019-09-22T21:03:50.647

Reputation: 635

5It's shorter if you just use the first two characters – Post Rock Garf Hunter – 2019-09-22T21:19:23.317

Nice one! That way, I can even drop the delimiter completely :) – movatica – 2019-09-22T21:23:28.087

1With the updates to the challenge this is no longer valid since it includes leading zeros. :( – Post Rock Garf Hunter – 2019-09-22T21:25:44.120

Yep, fix is expensive. – movatica – 2019-09-22T21:30:53.897

2Here is a slightly better fix – Post Rock Garf Hunter – 2019-09-22T21:34:51.597

1@movatica Your fix is incorrect. The lstrip method strips every character in the string which is given as it's argument, so "eight two" becomes "ight two", as "e" gets stripped. Also, "zero zero zero" should print out "0", not give out an error. – NemPlayer – 2019-09-22T21:36:11.923

You're both correct. lstrip does not work, I switched to a lambda instead – movatica – 2019-09-22T21:39:22.790

My version was not just to fix lstrip it is also 7 bytes shorter. – Post Rock Garf Hunter – 2019-09-22T21:41:48.390

5

Perl 6, 35 32 bytes

{+uniparse 'SP'~S:g/<</,DIGIT /}

Try it online!

Explanation

{                              }  # Anonymous block
                S:g/<</,DIGIT /   # Insert ",DIGIT " at
                                  # left word boundaries
           'SP'~  # Prepend 'SP' for space
  uniparse  # Parse list of Unicode names into string
 +  # Convert to integer

nwellnhof

Posted 2019-09-22T21:03:50.647

Reputation: 10 037

5

C (gcc), 89 bytes

i,n;f(char*w){for(i=n=0;n=*w%32?n^*w:(i+=n-2)&&!printf(L"8 0  72 3  59641"+n%17),*w++;);}

Try it online!

Thanks to @Ceilingcat smartest tricks :

- printf instead of putchar.   
- !printf instead of printf()&0. 
- And wide char !

AZTECCO

Posted 2019-09-22T21:03:50.647

Reputation: 2 441

3

Retina 0.8.2, 46 45 bytes

\w+
¶$&$&$&
%7=T`r\ot\huvs\ein`d`.
\D

^0+\B

Try it online! Link includes test cases. Explanation:

\w+
¶$&$&$&

Put each word on its own line and triplicate it.

%7=T`r\ot\huvs\ein`d`.

Transliterate the 7th character of each line using @UnrelatedString's string.

\D

Delete all remaining non-digit characters.

^0+\B

Delete leading zeros (but leave at least one digit).

Previous 46-byte more traditional solution:

T`z\wuxg`E
on
1
th
3
fi
5
se
7
ni
9
\D

^0+\B

Try it online! Link includes test cases. Explanation:

T`z\wuxg`E

The words zero, two, four, six and eight uniquely contain the letters zwuxg. Transliterate those to the even digits.

on
1
th
3
fi
5
se
7
ni
9

For the odd digits, just match the first two letters of each word individually.

\D

Delete all remaining non-digit characters.

^0+\B

Delete leading zeros (but leave at least one digit).

Neil

Posted 2019-09-22T21:03:50.647

Reputation: 95 035

3

05AB1E, 18 16 bytes

#ε6è}.•ƒ/ÿßÇf•Åβ

Try it online.

Explanation:

#                 # Split the (implicit) input-string on spaces
 ε  }             # Map each string to:
  6è              #  Get the character at 0-based index 6 (with automatic wraparound)
     .•ƒ/ÿßÇf•    # Push compressed string "rothuvsein"
              Åβ  # Convert the characters from custom base-"rothuvsein" to an integer
                  # (after which the top of the stack is output implicitly as result)

See this 05AB1E tip of mine (section How to compress strings not part of the dictionary?) to understand why .•ƒ/ÿßÇf• is "rothuvsein".

Kevin Cruijssen

Posted 2019-09-22T21:03:50.647

Reputation: 67 575

16 without rothuvsein – Grimmy – 2019-09-23T12:59:31.233

3

05AB1E, 17 16 bytes

•D±¾©xWÄ0•I#HèTβ

Try it online!

Perfect tie with the other 05AB1E answer, but using a completely different approach.

•D±¾©xWÄ0•               # compressed integer 960027003010580400
          I#             # split the input on spaces
            H            # convert each word from hex (eg "one" => 6526)
             è           # index (with wrap-around) into the digits of the large integer
              Tβ         # convert from base 10 to integer

Grimmy

Posted 2019-09-22T21:03:50.647

Reputation: 12 521

2

C++ (gcc), 478 218 142 bytes

-(a lot) thanks to Jo King

int f(string s){char c[]="N02K8>IE;6";int i=0,n=0;while(s[i]){n=n*10-1;while((s[i]^s[i+1])+47!=c[++n%10]);while(s[i++]!=' '&&s[i]);}return n;}

Try it online!

Sparkler

Posted 2019-09-22T21:03:50.647

Reputation: 321

1127 bytes – ceilingcat – 2019-09-23T08:01:03.563

2

Jelly, 20 18 17 bytes

Ḳ7ị“*;nÄƲ]³Ṙ»iƲ€Ḍ

Try it online!

-2 bytes from running "rothuvsein" through user202729's string compressor.

-1 byte from stealing Jonathan Allan's zero-free enklact string, and putting it in a marginally differently structured program.

Ḳ                    Split the input on spaces,
              Ʋ€     for each word
             i       find the 1-based index (defaulting to 0)
   “*;nÄƲ]³Ṙ»        in "othuvsein"
 7ị                  of the element at modular index 7,
                Ḍ    and convert from decimal digits to integer.

Unrelated String

Posted 2019-09-22T21:03:50.647

Reputation: 5 300

2

Japt, 13 bytes

¸mg6 ì`Ψuv 

Try it

Looks like everyone else beat me to the same idea - could've saved myself the hassle of writing a script to brute force the optimal string for compression, only to find that, up to index 1,000,000 (it was early, I hadn't had my caffeine yet!), "rothuvsein" is the only possible string!

¸mg6 ì`...     :Implicit input of string
¸              :Split on spaces
 m             :Map
  g6           :  Character at index 6 (0-based, with wrapping)
     ì         :Convert from digit array in base
      `...     :  Compressed string "rothuvsein"

The compressed string contains the characters at codepoints 206, 168, 117, 118, 160 & 136.

Shaggy

Posted 2019-09-22T21:03:50.647

Reputation: 24 623

1...did you really try up to 1000000? The lcm of the lengths of the digit names is 60, so there's no point trying beyond that (60 is equivalent to 0, 61 to 1, etc). – Grimmy – 2019-09-23T13:08:48.997

1@Grimy, it was early, I hadn't had my caffeine yet! Plugging a million into the script I wrote to generate all possibilities was as easy as any other number and saved me doing the maths on the LCM. – Shaggy – 2019-09-23T13:22:27.973

2

Ruby, 63 bytes, 52 bytes, 50 bytes

p $*.map{|d|'rothuvsein'.index (d*3)[6]}.join.to_i

-2 thanks to value ink's tip

Harrowed

Posted 2019-09-22T21:03:50.647

Reputation: 21

Welcome to Code Golf! In Ruby, $* is an alias for ARGV, so feel free to use that to save extra bytes. – Value Ink – 2019-09-23T21:11:59.520

2

T-SQL, 110 bytes

SELECT 0+STRING_AGG(CHARINDEX(LEFT(value,2),'_ontwthfofisiseeini')/2,'')
FROM STRING_SPLIT((SELECT*FROM i),' ')

Line break is for readability only.

Input is taken via table \$i\$, per our IO rules. I could have saved 14 bytes by pre-populating a string variable, but that's only allowed if the language has no other input methods.

Explanation:

  1. STRING_SPLIT takes the input string and separates it at the spaces
  2. CHARINDEX takes the first 2 characters and returns the (1-based) position in the string '_ontwthfofisiseeini'. 'ze' for zero is not in the string and returns 0 for "not found". The underscore ensures we only get multiples of two.
  3. Divide by 2 to get the final numeral
  4. STRING_AGG smashes the digits back together with no separator
  5. 0+ forces an implicit conversion to INT and drops any leading zeros. 1* would also work.

BradC

Posted 2019-09-22T21:03:50.647

Reputation: 6 099

2

x86 machine code, 46 bytes

Hexdump:

57 53 33 c0 33 ff f6 01 0f 75 15 6a 0a 5b 99 f7
f3 6b ff 0a 03 fa 33 c0 38 01 75 0f 97 5b 5f c3
69 c0 26 2b aa 6e 32 01 c1 e8 02 41 eb d8

It's a fastcall function - receives a pointer to the string in ecx, and returns the result in eax.

The hashing function multiplies by a magic number 1856645926, does a XOR with input byte, and shifts right by 2 bits.

Saving and restoring noclobber registers (edi and ebx) took 4 bytes, but I didn't find a more efficient way to implement this. Storing the constant 10 in ebx was particularly annoying!

Disassembly with corresponding code bytes:

57                   push        edi  ; edi = result
53                   push        ebx  ; we use ebx to store the constant 10
33 C0                xor         eax,eax  
33 FF                xor         edi,edi  
    myloop:
F6 01 0F             test        byte ptr [ecx],0Fh  ; check for end of word
75 15                jne         myhash
6A 0A                push        0Ah  
5B                   pop         ebx  
99                   cdq              ; prepare 64-bit dividend in edx:eax
F7 F3                div         eax,ebx  ; find the remainder of division by 10
6B FF 0A             imul        edi,edi,0Ah
03 FA                add         edi,edx  ; update the result
33 C0                xor         eax,eax  ; reset the hash temporary variable
38 01                cmp         byte ptr [ecx],al  ; check for end of input (here al=0)
75 0F                jne         mycontinue
97                   xchg        eax,edi  ; set the return register
5B                   pop         ebx  ; restore registers
5F                   pop         edi  ; restore registers
C3                   ret  
    myhash:
69 C0 26 2B AA 6E    imul        eax,eax,6EAA2B26h  ; hashing...
32 01                xor         al,byte ptr [ecx]  ; hashing...
C1 E8 02             shr         eax,2  ; hashing...
    mycontinue:
41                   inc         ecx  ; next input byte
EB D8                jmp         myloop

Equivalent C code:

int doit(const char* s)
{
    int result = 0;
    unsigned temp = 0;
    while (true)
    {
        int c = *s++;
        if ((c & 15) == 0)
        {
            temp %= 10;
            result = result * 10 + temp;
            temp = 0;
            if (c == 0)
                break;
            else
                continue;
        }
        temp *= 1856645926;
        temp ^= c;
        temp >>= 2;
    }
    return result;
}

anatolyg

Posted 2019-09-22T21:03:50.647

Reputation: 10 719

How did you find the magic numbers? – Sparkler – 2019-09-25T01:22:01.027

I did a search using my C code - tried all 32-bit numbers and all shifts. There are only a few possibilities - the code found only one in the range up to 2000000000. – anatolyg – 2019-09-25T09:39:57.963

you can use edx instead of edi (push edx before the idiv, pop eax after it, imul with ebx, add eax to edx) to save one byte. – peter ferrie – 2019-09-28T06:42:17.380

1

Clean, 88 bytes

import StdEnv,Text
$s=toInt{#i\\n<-split" "s,c<-:"rothuvsein"&i<-['0'..]|c==(n+n+n).[6]}

Try it online!

Heavily based on Jonathan Allan's answer.
Uses a comprehension for indexing instead of indexOf / elemIndex.

Οurous

Posted 2019-09-22T21:03:50.647

Reputation: 7 916

1

J, 38 bytes

('b\e~mjPxw['i.[:u:70+1#.15|3&u:)&>@;:

Try it online!

Jonah

Posted 2019-09-22T21:03:50.647

Reputation: 8 729

1

sed -re, 78 bytes

s/three/3/g;s/five/5/g;s/\w\w(\w)\w*/\1/g;s/ //g;y/eouxvgnr/12467890/;s/^0*//

Herzausrufezeichen

Posted 2019-09-22T21:03:50.647

Reputation: 11

1

VBA, 160 bytes

Function e(s)
s = Split(s, " ")
For i = LBound(s) To UBound(s)
s(i) = Int((InStr("ontwthfofisiseeini", Left(s(i), 2)) + 1) / 2)
Next
e = Val(Join(s, ""))
End Function

Matches the first two characters in a string, zero excluded.

user3819867

Posted 2019-09-22T21:03:50.647

Reputation: 439

1(1/3) Great approach here, but it looks like you failed to count the newlines as bytes, and that you can get the bytecount way down if you use some tricks. Namely, there is no need to decaler the " " in the Split as that is the default value, Lbound(s) will always be zero and you can drop (...)+1 if you add a space to the beginning of the InStr data. Asside from that, you are better off using either a string or variant to handle holding the data coming out of the InStr statement as you dont need to use (i) or Join(s,""). Oh and Int(a/b) is the same as a\b. – Taylor Scott – 2020-01-03T12:49:59.403

1(2/3) Using these, and a for each loop you can get your function down to 101 bytes as Function G(x):For Each c In Split(x):G=10*G+InStr(" ontwthfofisiseeini",Left(c,2))\2:Next:End Function or 101 bytes as a sub thet implicitly prints the output to the debug window as Sub F(x):For Each c In Split(x):v=10*v+InStr(" ontwthfofisiseeini",Left(c,2))\2:Next:Debug.?v:End Sub. If you restrict your answer you can even get it to 81 bytes as a Excel VBA immediate window function as For Each c In Split([A1]):v=10*v+InStr(" ontwthfofisiseeini",Left(c,2))\2:Next:?v. – Taylor Scott – 2020-01-03T12:50:12.867

1

(3/3) I highly recommend that you go take a look at our Tips for golfing in VBA Page for more info on these approaches. Cheers

– Taylor Scott – 2020-01-03T12:50:27.983

1

Charcoal, 19 bytes

I⍘⭆⪪S §ι⁶rothuvsein

Try it online! Link is to verbose version of code. Port of @KevinCruijssen's 05AB1E answer. Explanation:

    S               Input string
   ⪪                Split on spaces
  ⭆                 Map over words and join
       ι            Current word
      §             Cyclically indexed
        ⁶           Literal `6`
 ⍘       rothuvsein Custom base conversion
I                   Cast to string for implicit print

Neil

Posted 2019-09-22T21:03:50.647

Reputation: 95 035

1

PowerShell, 48 bytes

+-join($args|%{'rothuvsein'.indexof(($_*3)[6])})

Try it online!

Uses the same rothuvsein trick as others, thanks to Jonathan Allan. Expects input arguments via splatting, which on TIO manifests as separate command-line arguments.

AdmBorkBork

Posted 2019-09-22T21:03:50.647

Reputation: 41 581

1

Kotlin, 83 bytes

fun String.d()=split(' ').fold(""){a,b->a+"rothuvsein".indexOf((b+b+b)[6])}.toInt()

+1 byte if you wanna support longs with toLong()

Same rothuvsein trick as the others, saving some precious bytes thanks to kotlin's nice toInt() and fold(). I just can't shake the feeling that some more bytes can be shaved off though...

Alex Papageorgiou

Posted 2019-09-22T21:03:50.647

Reputation: 41

1

Windows Batch, 169 bytes

@setlocal enabledelayedexpansion
@set z=zeontwthfofisiseeini
:a
@set b=%1
@for /l %%c in (0,2,18)do @if "!b:~0,2!"=="!z:~%%c,2!" set/aa=a*10+%%c/2&shift&goto a
@echo %a%

peter ferrie

Posted 2019-09-22T21:03:50.647

Reputation: 804

0

Perl 6, 45 bytes

+*.words>>.&{+(1...*.uniname.comb(.uc))}.chrs

Try it online!

Jo King

Posted 2019-09-22T21:03:50.647

Reputation: 38 234

0

BaCon, 83 72 bytes

Assuming the string is provided in w$, this code looks up the index in "zeontwthfofisiseeini" using a regular expression based on the unique first 2 characters of each word. The index is then divided by 2 providing the correct result.

FOR x$ IN w$:r=r*10+REGEX("zeontwthfofisiseeini",LEFT$(x$,2))/2:NEXT:?r

Peter

Posted 2019-09-22T21:03:50.647

Reputation: 119