x86-64 machine code, 44 bytes
(The same machine code works in 32-bit mode as well.)
@Daniel Schepler's answer was a starting point for this, but this has at least one new algorithmic idea (not just better golfing of the same idea): The ASCII codes for 'B'
(1000010
) and 'X'
(1011000
) give 16 and 2 after masking with 0b0010010
.
So after excluding decimal (non-zero leading digit) and octal (char after '0'
is less than 'B'
), we can just set base = c & 0b0010010
and jump into the digit loop.
Callable with x86-64 System V as unsigned __int128 parse_cxx14_int(int dummy, const char*rsi);
Extract the EDX return value from the high half of the unsigned __int128
result with tmp>>64
.
.globl parse_cxx14_int
## Input: pointer to 0-terminated string in RSI
## output: integer in EDX
## clobbers: RAX, RCX (base), RSI (points to terminator on return)
parse_cxx14_int:
xor %eax,%eax # initialize high bits of digit reader
cdq # also initialize result accumulator edx to 0
lea 10(%rax), %ecx # base 10 default
lodsb # fetch first character
cmp $'0', %al
jne .Lentry2
# leading zero. Legal 2nd characters are b/B (base 2), x/X (base 16)
# Or NUL terminator = 0 in base 10
# or any digit or ' separator (octal). These have ASCII codes below the alphabetic ranges
lodsb
mov $8, %cl # after '0' have either digit, apostrophe, or terminator,
cmp $'B', %al # or 'b'/'B' or 'x'/'X' (set a new base)
jb .Lentry2 # enter the parse loop with base=8 and an already-loaded character
# else hex or binary. The bit patterns for those letters are very convenient
and $0b0010010, %al # b/B -> 2, x/X -> 16
xchg %eax, %ecx
jmp .Lentry
.Lprocessdigit:
sub $'0' & (~32), %al
jb .Lentry # chars below '0' are treated as a separator, including '
cmp $10, %al
jb .Lnum
add $('0'&~32) - 'A' + 10, %al # digit value = c-'A' + 10. we have al = c - '0'&~32.
# c = al + '0'&~32. val = m+'0'&~32 - 'A' + 10
.Lnum:
imul %ecx, %edx
add %eax, %edx # accum = accum * base + newdigit
.Lentry:
lodsb # fetch next character
.Lentry2:
and $~32, %al # uppercase letters (and as side effect,
# digits are translated to N+16)
jnz .Lprocessdigit # space also counts as a terminator
.Lend:
ret
The changed blocks vs. Daniel's version are (mostly) indented less than other instruction. Also the main loop has its conditional branch at the bottom. This turned out to be a neutral change because neither path could fall into the top of it, and the dec ecx / loop .Lentry
idea for entering the loop turned out not to be a win after handling octal differently. But it has fewer instructions inside the loop with the loop in idiomatic form do{}while structure, so I kept it.
Daniel's C++ test harness works unchanged in 64-bit mode with this code, which uses the same calling convention as his 32-bit answer.
g++ -Og parse-cxx14.cpp parse-cxx14.s &&
./a.out < tests | diff -u -w - tests.good
Disassembly, including the machine code bytes that are the actual answer
0000000000000000 <parse_cxx14_int>:
0: 31 c0 xor %eax,%eax
2: 99 cltd
3: 8d 48 0a lea 0xa(%rax),%ecx
6: ac lods %ds:(%rsi),%al
7: 3c 30 cmp $0x30,%al
9: 75 1c jne 27 <parse_cxx14_int+0x27>
b: ac lods %ds:(%rsi),%al
c: b1 08 mov $0x8,%cl
e: 3c 42 cmp $0x42,%al
10: 72 15 jb 27 <parse_cxx14_int+0x27>
12: 24 12 and $0x12,%al
14: 91 xchg %eax,%ecx
15: eb 0f jmp 26 <parse_cxx14_int+0x26>
17: 2c 10 sub $0x10,%al
19: 72 0b jb 26 <parse_cxx14_int+0x26>
1b: 3c 0a cmp $0xa,%al
1d: 72 02 jb 21 <parse_cxx14_int+0x21>
1f: 04 d9 add $0xd9,%al
21: 0f af d1 imul %ecx,%edx
24: 01 c2 add %eax,%edx
26: ac lods %ds:(%rsi),%al
27: 24 df and $0xdf,%al
29: 75 ec jne 17 <parse_cxx14_int+0x17>
2b: c3 retq
Other changes from Daniel's version include saving the sub $16, %al
from inside the digit-loop, by using more sub
instead of test
as part of detecting separators, and digits vs. alphabetic characters.
Unlike Daniel's every character below '0'
is treated as a separator, not just '\''
. (Except ' '
: and $~32, %al
/ jnz
in both our loops treats space as a terminator, which is possibly convenient for testing with an integer at the start of a line.)
Every operation that modifies %al
inside the loop has a branch consuming flags set by the result, and each branch goes (or falls through) to a different location.
2Sandbox link – my pronoun is monicareinstate – 2019-05-16T13:11:15.710
Will there be combined cases like
0b10xA
– Luis felipe De jesus Munoz – 2019-05-16T13:15:08.5834@LuisfelipeDejesusMunoz No; how did you expect that to be parsed? – my pronoun is monicareinstate – 2019-05-16T13:19:57.913
Since binary literals starts with
0b
and hexadecimal starts with0x
we can assume that0b10xA
can be110
(just a suggestion for another challenge) – Luis felipe De jesus Munoz – 2019-05-16T13:21:55.8031I assume simply writing a function in C++14 would be cheating, right? Since the compiler already does it automatically (even if it is internally 500+ lines of code...) – Darrel Hoffman – 2019-05-16T20:22:25.290
5@DarrelHoffman You couldn't just do it with "a function in C++14" though, since that wouldn't take a string input. Maybe with some script that invokes a C++ compiler. – aschepler – 2019-05-16T21:05:22.843
2The string
0
might be a good test case to add (it revealed a bug in one of my recent revisions). – Daniel Schepler – 2019-05-17T01:25:31.880Note that the rules do allow '0' as a valid octal literal. – FrownyFrog – 2019-05-17T08:45:17.450
@DarrelHoffman: C++ has no
– Peter Cordes – 2019-05-19T01:20:34.587eval
. The language model is designed around strictly ahead-of-time compilation. Of course interpreting or JIT implementations are possible, but there are no language built-ins for getting new source code parsed at run-time. C++ is one of the more complicated languages to parse (full of near ambiguities between operator vs. template or function declaration vs. whatever: the most vexing parse), and compiling template /constexpr
code can require arbitrary amounts of computation.The correct output never will exceed 2*10^9
does this mean the test cases won't exceed 2000000000 or does it mean we need to over/underflow? – Benjamin Urquhart – 2019-06-18T11:56:15.260That means the test cases never exceed
2'000'000'000
. – my pronoun is monicareinstate – 2019-06-18T12:03:46.290