Parse a C++14 integer literal

27

3

According to http://en.cppreference.com/w/cpp/language/integer_literal, integer literals consist of a decimal/hex/octal/binary literal and a optional integer suffix, that is obviously completely unnecessary, wastes precious bytes and is not used in this challenge.

A decimal literal is a non-zero decimal digit (1, 2, 3, 4, 5, 6, 7, 8, 9), followed by zero or more decimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9).

A octal literal is the digit zero (0) followed by zero or more octal digits (0, 1, 2, 3, 4, 5, 6, 7).

A hexadecimal literal is the character sequence 0x or the character sequence 0X followed by one or more hexadecimal digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, A, b, B, c, C, d, D, e, E, f, F) (note the case-insensitivity of abcdefx).

A binary literal is the character sequence 0b or the character sequence 0B followed by one or more binary digits (0, 1).

Additionally, there may optionally be some 's as a digit separator. They have no meaning and can be ignored.

Input

A string that represents a C++14 integer literal or an array of its charcodes.

Output

The number represented by the input string in base 10, with an optional trailing newline. The correct output never will exceed 2*10^9

Winning criteria

The GCC contributors need over 500 lines of code to do this, therefore our code must be as short as possible!

Test cases:

0                       ->    0
1                       ->    1
12345                   ->    12345
12345'67890             ->    1234567890
0xFF                    ->    255
0XfF                    ->    255
0xAbCdEf                ->    11259375
0xa'bCd'eF              ->    11259375
0b1111'0000             ->    240
0b0                     ->    0
0B1'0                   ->    2
0b1                     ->    1
00                      ->    0
01                      ->    1
012345                  ->    5349
0'123'4'5               ->    5349

my pronoun is monicareinstate

Posted 2019-05-16T13:10:02.433

Reputation: 3 111

2Sandbox link – my pronoun is monicareinstate – 2019-05-16T13:11:15.710

Will there be combined cases like 0b10xA – Luis felipe De jesus Munoz – 2019-05-16T13:15:08.583

4@LuisfelipeDejesusMunoz No; how did you expect that to be parsed? – my pronoun is monicareinstate – 2019-05-16T13:19:57.913

Since binary literals starts with 0b and hexadecimal starts with 0x we can assume that 0b10xA can be 110 (just a suggestion for another challenge) – Luis felipe De jesus Munoz – 2019-05-16T13:21:55.803

1I assume simply writing a function in C++14 would be cheating, right? Since the compiler already does it automatically (even if it is internally 500+ lines of code...) – Darrel Hoffman – 2019-05-16T20:22:25.290

5@DarrelHoffman You couldn't just do it with "a function in C++14" though, since that wouldn't take a string input. Maybe with some script that invokes a C++ compiler. – aschepler – 2019-05-16T21:05:22.843

2The string 0 might be a good test case to add (it revealed a bug in one of my recent revisions). – Daniel Schepler – 2019-05-17T01:25:31.880

Note that the rules do allow '0' as a valid octal literal. – FrownyFrog – 2019-05-17T08:45:17.450

@DarrelHoffman: C++ has no eval. The language model is designed around strictly ahead-of-time compilation. Of course interpreting or JIT implementations are possible, but there are no language built-ins for getting new source code parsed at run-time. C++ is one of the more complicated languages to parse (full of near ambiguities between operator vs. template or function declaration vs. whatever: the most vexing parse), and compiling template / constexpr code can require arbitrary amounts of computation.

– Peter Cordes – 2019-05-19T01:20:34.587

The correct output never will exceed 2*10^9 does this mean the test cases won't exceed 2000000000 or does it mean we need to over/underflow? – Benjamin Urquhart – 2019-06-18T11:56:15.260

That means the test cases never exceed 2'000'000'000. – my pronoun is monicareinstate – 2019-06-18T12:03:46.290

Answers

6

Japt, 6 bytes

OxUr"'

OxUr"'  Full Program. Implicit Input U
  Ur"'  Remove ' from U
Ox      Eval as javascript

Try it online!

Luis felipe De jesus Munoz

Posted 2019-05-16T13:10:02.433

Reputation: 9 639

How does this work? – lirtosiast – 2019-06-04T09:35:12.160

@lirtosiast Basically the same as my js answer. I Remove ' from the input and then evaluate it as Js – Luis felipe De jesus Munoz – 2019-06-04T12:12:15.110

22

x86 (32-bit) machine code, 59 57 bytes

This function takes esi as a pointer to a null-terminated string and returns the value in edx. (Listing below is GAS input in AT&T syntax.)

        .globl parse_cxx14_int
        .text
parse_cxx14_int:
        push $10
        pop %ecx                # store 10 as base
        xor %eax,%eax           # initialize high bits of digit reader
        cdq                     # also initialize result accumulator edx to 0
        lodsb                   # fetch first character
        cmp $'0', %al
        jne .Lparseloop2
        lodsb
        and $~32, %al           # uppercase letters (and as side effect,
                                # digits are translated to N+16)
        jz .Lend                # "0" string
        cmp $'B', %al           # after '0' have either digit, apostrophe,
                                # 'b'/'B' or 'x'/'X'
        je .Lbin
        jg .Lhex
        dec %ecx
        dec %ecx                # update base to 8
        jmp .Lprocessdigit      # process octal digit that we just read (or
                                # skip ' if that is what we just read)   
.Lbin:
        sub $14, %ecx           # with below will update base to 2
.Lhex:
        add $6, %ecx            # update base to 16
.Lparseloop:
        lodsb                   # fetch next character
.Lparseloop2:
        and $~32, %al           # uppercase letters (and as side effect,
                                # digits are translated to N+16)
        jz .Lend
.Lprocessdigit:
        cmp $7, %al             # skip ' (ASCII 39 which would have been
                                # translated to 7 above)
        je .Lparseloop
        test $64, %al           # distinguish letters and numbers
        jz .Lnum
        sub $39, %al            # with below will subtract 55 so e.g. 'A'==65
                                # will become 10
.Lnum:
        sub $16, %al            # translate digits to numerical value
        imul %ecx, %edx
#        movzbl %al, %eax
        add %eax, %edx          # accum = accum * base + newdigit
        jmp .Lparseloop
.Lend:
        ret

And a disassembly with byte counts - in Intel format this time, in case you prefer that one.

Disassembly of section .text:

00000000 <parse_cxx14_int>:
   0:   6a 0a                   push   0xa
   2:   59                      pop    ecx
   3:   31 c0                   xor    eax,eax
   5:   99                      cdq    
   6:   ac                      lods   al,BYTE PTR ds:[esi]
   7:   3c 30                   cmp    al,0x30
   9:   75 16                   jne    21 <parse_cxx14_int+0x21>
   b:   ac                      lods   al,BYTE PTR ds:[esi]
   c:   24 df                   and    al,0xdf
   e:   74 28                   je     38 <parse_cxx14_int+0x38>
  10:   3c 42                   cmp    al,0x42
  12:   74 06                   je     1a <parse_cxx14_int+0x1a>
  14:   7f 07                   jg     1d <parse_cxx14_int+0x1d>
  16:   49                      dec    ecx
  17:   49                      dec    ecx
  18:   eb 0b                   jmp    25 <parse_cxx14_int+0x25>
  1a:   83 e9 0e                sub    ecx,0xe
  1d:   83 c1 06                add    ecx,0x6
  20:   ac                      lods   al,BYTE PTR ds:[esi]
  21:   24 df                   and    al,0xdf
  23:   74 13                   je     38 <parse_cxx14_int+0x38>
  25:   3c 07                   cmp    al,0x7
  27:   74 f7                   je     20 <parse_cxx14_int+0x20>
  29:   a8 40                   test   al,0x40
  2b:   74 02                   je     2f <parse_cxx14_int+0x2f>
  2d:   2c 27                   sub    al,0x27
  2f:   2c 10                   sub    al,0x10
  31:   0f af d1                imul   edx,ecx
  34:   01 c2                   add    edx,eax
  36:   eb e8                   jmp    20 <parse_cxx14_int+0x20>
  38:   c3                      ret    

And in case you want to try it, here is the C++ test driver code that I linked with it (including the calling convention specification in GCC asm syntax):

#include <cstdio>
#include <string>
#include <iostream>

inline int parse_cxx14_int_wrap(const char *s) {
    int result;
    const char* end;
    __asm__("call parse_cxx14_int" :
            "=d"(result), "=S"(end) :
            "1"(s) :
            "eax", "ecx", "cc");
    return result;
}

int main(int argc, char* argv[]) {
    std::string s;
    while (std::getline(std::cin, s))
        std::printf("%-16s -> %d\n", s.c_str(), parse_cxx14_int_wrap(s.c_str()));
    return 0;
}

-1 byte due to comment by Peter Cordes

-1 byte from updating to use two decrements to change 10 to 8

Daniel Schepler

Posted 2019-05-16T13:10:02.433

Reputation: 1 001

1Only you're missing tests for overflows... Too large a number gets reported by compilers. – Alexis Wilke – 2019-05-17T04:38:10.043

2Can you swap your register usage for rdx and rbx? Then you can use 1-bytecdqto zerordxfromeax`. – Peter Cordes – 2019-05-17T10:48:57.780

1This should be either list the byte count of your assembly, or be labelled as 59 bytes of x86 machine code. – Potato44 – 2019-05-17T12:20:37.597

2@PeterCordes Thanks, didn't know about that one. (Also, on looking at it again, I noticed that changing the base from 10 to 8 could be 2 bytes - from two decrement instructions - instead of 3 bytes.) – Daniel Schepler – 2019-05-17T15:56:22.743

1

@DanielSchepler: See Tips for golfing in x86/x64 machine code. Another nice way to generate constants is lea 10(%eax), %edx 3 bytes given a known constant, same as push/pop but more efficient.

– Peter Cordes – 2019-05-17T16:03:44.843

@Potato44 Fair enough, I changed the title to "machine code". – Daniel Schepler – 2019-05-17T16:11:27.843

3@AlexisWilke It also doesn't test for invalid format (e.g. digits out of range of the given base) which compilers would also do. But according to the problem statement, the input is guaranteed to be valid and not to overflow a 32-bit signed integer. – Daniel Schepler – 2019-05-17T16:49:09.373

You could make this 64-bit code again without losing bytes by using 2-byte mov $8, %cl instead of 2x dec %ecx. That code is only reached with ECX=10 so we know the upper bytes are already zero. (Fun fact: that's better for performance on Haswell and newer, and non-Intel CPUs. Only P6-family has partial-register stalls when reading the full reg later.) – Peter Cordes – 2019-05-18T22:04:40.477

NVM we do still want 32-bit mode. dec %ecx / loop .Lprocessdigit is even better, folding dec+jmp into a 2-byte dec-and-branch that's going to be taken because --ecx != 0. And yes, loop can jump forward; it uses a regular signed rel8.

– Peter Cordes – 2019-05-18T22:24:56.763

For decoding B vs. X vs. octal, B is 01000010 (low bits = 2) while X is 01011000 (one of the set bits = 1<<4 = 16). After excluding al<'B' with a jl .Lparseloop2 (which will check for a terminator), we can do and $0b0010010, %al (2 bytes) / xchg %eax, %ecx (1 byte) and fall into the loop. This leaves the upper bytes of both EAX and ECX still zero, as required, because we have that guarantee before xchg. – Peter Cordes – 2019-05-18T22:33:37.380

Yup, that helped. 46 bytes. https://godbolt.org/z/6epa8_. (and tested with the question's test cases on my desktop, with your handy C++ caller). I also restructured the main loop with the loop branch at the bottom because I thought I was going to be able to fall into it or use dec / loop, but that didn't happen. So it ended up being neutral for code size, saving a jmp at the bottom but adding another jmp to enter it.

– Peter Cordes – 2019-05-18T23:18:58.327

44 bytes, removing an add from inside the digit loop by redoing the checks with a more normal sub/cmp range check. https://godbolt.org/z/x7SFvq. BTW, I was expecting it would take more instructions to turn 'B' and 'X' into integers, like maybe an add and a right-shift as well. I was really surprised that just one AND does the trick.

– Peter Cordes – 2019-05-19T00:45:00.553

I posted my version as a separate answer; I think the AND trick is different enough to justify it. Thanks for the starting point and test harness, would upvote again if I could. :)

– Peter Cordes – 2019-05-19T01:13:44.270

12

JavaScript (Babel Node), 26 bytes

lol x2

_=>eval(_.split`'`.join``)

Try it online!

Luis felipe De jesus Munoz

Posted 2019-05-16T13:10:02.433

Reputation: 9 639

4This isn't BabelJS exclusive, it works from ES6 onwards – Bassdrop Cumberwubwubwub – 2019-05-16T13:29:57.360

1@BassdropCumberwubwubwub, the header was probably copied from TIO. – Shaggy – 2019-05-16T17:41:08.327

Nice, I first tried to use Number because it handles binary and hex, but apparently not octal Number("010") === 10 – Carl Walsh – 2019-05-16T21:48:06.213

7

C++ (gcc), 141 138 134 120 bytes

This is a function that takes an array of characters (specified as a pair of pointers to the start and end - using the pair of iterators idiom) and returns the number. Note that the function mutates the input array.

(This does rely on the behavior of gcc/libstdc++ that #include<cstdlib> also places the functions in global scope. For strictly standard compliant code, replace with #include<stdlib.h> for a cost of one more character.)

Brief description: The code first uses std::remove to filter out ' characters (ASCII 39). Then, strtol with a base of 0 will already handle the decimal, octal, and hexadecimal cases, so the only other case to check for is a leading 0b or 0B and if so, set the base for strtol to 2 and start parsing after the leading 2 characters.

#import<algorithm>
#import<cstdlib>
int f(char*s,char*e){e=s[*std::remove(s,e,39)=1]&31^2?s:s+2;return strtol(e,0,e-s);}

Try it online.


Saved 3 bytes due to suggestion by ceilingcat and some more golfing that followed.

Saved 4 bytes due to suggestions by grastropner.

-2 bytes by Lucas

-12 bytes by l4m2

Daniel Schepler

Posted 2019-05-16T13:10:02.433

Reputation: 1 001

134 bytes – gastropner – 2019-05-17T19:54:34.483

Incorporated, thanks. – Daniel Schepler – 2019-05-17T20:52:26.620

132 bytes by using the deprecated #import instead of #include?

– Lucas – 2019-05-17T23:41:39.827

If invalid input is undefined behavior, no need to check if 1st char is 0 for base 2 – l4m2 – 2019-05-18T12:19:18.370

so 124

– l4m2 – 2019-05-18T12:23:26.217

122 – l4m2 – 2019-05-18T12:24:46.663

120 – l4m2 – 2019-05-18T12:28:26.983

Nice! Thanks, I'll incorporate that. – Daniel Schepler – 2019-05-18T18:42:14.820

I do -12B, other 2 come from Lucas – l4m2 – 2019-05-19T00:27:50.323

99 – peterzuger – 2019-05-22T12:07:37.260

5

Python 2, 32 bytes

lambda a:eval(a.replace("'",""))

Try it online!

lol

(needs Python 2 because Python 3 changed octal literals to 0o(...)).

HyperNeutrino

Posted 2019-05-16T13:10:02.433

Reputation: 26 575

3we've truly gone full circle at this point – osuka_ – 2019-05-17T23:54:12.140

4

R, 79 71 69 bytes

`+`=strtoi;s=gsub("'","",scan(,""));na.omit(c(+s,sub("..",0,s)+2))[1]

Try it online!

strtoi does everything except for the base 2 conversions and ignoring the ', so there's quite a lot of bytes just to fix those things.

Thanks to Aaron Hayman for -6 bytes, and inspiring -4 more bytes (and counting!)

Verify all test cases (old version)

Giuseppe

Posted 2019-05-16T13:10:02.433

Reputation: 21 077

can save a byte replacing sub("0b|B" with sub("b|B", since the leading "0" will not affect the value. Can get another by renaming strtoi – Aaron Hayman – 2019-05-16T14:29:35.800

1

74 bytes: Try it online!

– Aaron Hayman – 2019-05-16T14:52:32.730

1@AaronHayman wow, I've never seen na.omit before. Super handy here, and I golfed a bit more off :-) – Giuseppe – 2019-05-16T15:00:00.077

1

If we assume every fail of the first strtoi is a binary, you can use substring instead of sub to save another byte: Try it online!

– Aaron Hayman – 2019-05-16T15:18:11.963

1@AaronHayman we can strip off the first 2 characters of s using sub instead with sub('..','',s) which is another byte shorter! – Giuseppe – 2019-05-16T15:40:15.760

4

Perl 5 (-p), 14 bytes

y/'/_/;$_=eval

TIO

Nahuel Fouilleul

Posted 2019-05-16T13:10:02.433

Reputation: 5 582

4

05AB1E, 16 14 bytes

Saved 2 bytes thanks to Grimy

''KlÐïK>i8ö}.E

Try it online! or as a Test Suite

Explanation

''K                # remove "'" from input
   l               # and convert to lower-case
    Ð              # triplicate
     ï             # convert one copy to integer
      K            # and remove it from the second copy
       >i  }       # if the result is 0
         8ö        # convert from base-8 to base-10
            .E     # eval

Emigna

Posted 2019-05-16T13:10:02.433

Reputation: 50 798

-2 bytes – Grimmy – 2019-05-17T13:00:47.300

And here's a fake 13 (passes all the test cases, but fails on e.g. 0010).

– Grimmy – 2019-05-17T13:08:41.493

@Grimy: Thanks! Cool use of ï! – Emigna – 2019-05-17T13:26:29.223

4

Excel, 115 bytes

=DECIMAL(SUBSTITUTE(REPLACE(A1,2,1,IFERROR(VALUE(MID(A1,2,1)),)),"'",),VLOOKUP(A1,{"0",8;"0B",2;"0X",16;"1",10},2))

Input from A1, output to wherever you put this formula. Array formula, so use Ctrl+Shift+Enter to enter it.

I added a couple test cases you can see in the image - some early attempts handled all given test cases correctly but got rows 16 and/or 17 wrong.

enter image description here

Sophia Lechner

Posted 2019-05-16T13:10:02.433

Reputation: 1 200

Is it against the rules to omit the final two closing parentheses and take advantage of the fact that the “compiler” (pressing return or tab) will error-correct for you? – Lucas – 2019-06-15T03:13:04.603

In my personal opinion, yes. I don't think there's a site consensus. Excel adding the parentheses feels like the equivalent of a code-completion feature in another language's IDE, which should be ignored for byte counting. (But, I think "?" should be counted as 1 byte in BASIC even though it will be silently expanded to "PRINT" so maybe I'm not entirely consistent here). – Sophia Lechner – 2019-06-17T17:48:19.567

3

x86-64 machine code, 44 bytes

(The same machine code works in 32-bit mode as well.)

@Daniel Schepler's answer was a starting point for this, but this has at least one new algorithmic idea (not just better golfing of the same idea): The ASCII codes for 'B' (1000010) and 'X' (1011000) give 16 and 2 after masking with 0b0010010.

So after excluding decimal (non-zero leading digit) and octal (char after '0' is less than 'B'), we can just set base = c & 0b0010010 and jump into the digit loop.

Callable with x86-64 System V as unsigned __int128 parse_cxx14_int(int dummy, const char*rsi); Extract the EDX return value from the high half of the unsigned __int128 result with tmp>>64.

        .globl parse_cxx14_int
## Input: pointer to 0-terminated string in RSI
## output: integer in EDX
## clobbers: RAX, RCX (base), RSI (points to terminator on return)
parse_cxx14_int:
        xor %eax,%eax           # initialize high bits of digit reader
        cdq                     # also initialize result accumulator edx to 0
        lea 10(%rax), %ecx      # base 10 default
        lodsb                   # fetch first character
        cmp $'0', %al
        jne .Lentry2
    # leading zero.  Legal 2nd characters are b/B (base 2), x/X (base 16)
    # Or NUL terminator = 0 in base 10
    # or any digit or ' separator (octal).  These have ASCII codes below the alphabetic ranges
    lodsb

    mov    $8, %cl              # after '0' have either digit, apostrophe, or terminator,
    cmp    $'B', %al            # or 'b'/'B' or 'x'/'X'  (set a new base)
    jb   .Lentry2               # enter the parse loop with base=8 and an already-loaded character
         # else hex or binary. The bit patterns for those letters are very convenient
    and    $0b0010010, %al      # b/B -> 2,   x/X -> 16
    xchg   %eax, %ecx
    jmp  .Lentry

.Lprocessdigit:
    sub  $'0' & (~32), %al
    jb   .Lentry                 # chars below '0' are treated as a separator, including '
    cmp  $10, %al
    jb  .Lnum
    add  $('0'&~32) - 'A' + 10, %al   # digit value = c-'A' + 10.  we have al = c - '0'&~32.
                                        # c = al + '0'&~32.  val = m+'0'&~32 - 'A' + 10
.Lnum:
        imul %ecx, %edx
        add %eax, %edx          # accum = accum * base + newdigit
.Lentry:
        lodsb                   # fetch next character
.Lentry2:
        and $~32, %al           # uppercase letters (and as side effect,
                                # digits are translated to N+16)
        jnz .Lprocessdigit      # space also counts as a terminator
.Lend:
        ret

The changed blocks vs. Daniel's version are (mostly) indented less than other instruction. Also the main loop has its conditional branch at the bottom. This turned out to be a neutral change because neither path could fall into the top of it, and the dec ecx / loop .Lentry idea for entering the loop turned out not to be a win after handling octal differently. But it has fewer instructions inside the loop with the loop in idiomatic form do{}while structure, so I kept it.

Daniel's C++ test harness works unchanged in 64-bit mode with this code, which uses the same calling convention as his 32-bit answer.

g++ -Og parse-cxx14.cpp parse-cxx14.s &&
./a.out < tests | diff -u -w - tests.good

Disassembly, including the machine code bytes that are the actual answer

0000000000000000 <parse_cxx14_int>:
   0:   31 c0                   xor    %eax,%eax
   2:   99                      cltd   
   3:   8d 48 0a                lea    0xa(%rax),%ecx
   6:   ac                      lods   %ds:(%rsi),%al
   7:   3c 30                   cmp    $0x30,%al
   9:   75 1c                   jne    27 <parse_cxx14_int+0x27>
   b:   ac                      lods   %ds:(%rsi),%al
   c:   b1 08                   mov    $0x8,%cl
   e:   3c 42                   cmp    $0x42,%al
  10:   72 15                   jb     27 <parse_cxx14_int+0x27>
  12:   24 12                   and    $0x12,%al
  14:   91                      xchg   %eax,%ecx
  15:   eb 0f                   jmp    26 <parse_cxx14_int+0x26>
  17:   2c 10                   sub    $0x10,%al
  19:   72 0b                   jb     26 <parse_cxx14_int+0x26>
  1b:   3c 0a                   cmp    $0xa,%al
  1d:   72 02                   jb     21 <parse_cxx14_int+0x21>
  1f:   04 d9                   add    $0xd9,%al
  21:   0f af d1                imul   %ecx,%edx
  24:   01 c2                   add    %eax,%edx
  26:   ac                      lods   %ds:(%rsi),%al
  27:   24 df                   and    $0xdf,%al
  29:   75 ec                   jne    17 <parse_cxx14_int+0x17>
  2b:   c3                      retq   

Other changes from Daniel's version include saving the sub $16, %al from inside the digit-loop, by using more sub instead of test as part of detecting separators, and digits vs. alphabetic characters.

Unlike Daniel's every character below '0' is treated as a separator, not just '\''. (Except ' ': and $~32, %al / jnz in both our loops treats space as a terminator, which is possibly convenient for testing with an integer at the start of a line.)

Every operation that modifies %al inside the loop has a branch consuming flags set by the result, and each branch goes (or falls through) to a different location.

Peter Cordes

Posted 2019-05-16T13:10:02.433

Reputation: 2 810

Do you even need the initialization of eax given that AIUI in 64-bit mode opcodes with small destination will reset the higher bits to 0? – Daniel Schepler – 2019-05-20T23:40:10.163

@Daniel: writing a 32-bit register zero-extends to 64-bit. Writing an 8 or 16-bit register keeps the behaviour from other modes: merge into the existing value. AMD64 didn't fix the false dependency for 8 and 16-bit registers, and didn't change setcc r/m8 into setcc r/m32, so we still need a stupid 2-instruction xor-zero / set flags / setcc %al sequence to create a 32/64-bit 0 or 1 variable, and it needs the zeroed register before the flag-setting. (Or use mov $0, %eax instead, or use movzx on the critical path).

– Peter Cordes – 2019-05-21T23:45:43.157

1

Octave, 29 21 20 bytes

@(x)str2num(x(x>39))

Try it online!

-8 bytes thanks to @TomCarpenter

Expired Data

Posted 2019-05-16T13:10:02.433

Reputation: 3 129

For 22 bytes: @(x)str2num(x(x~="'")) – Tom Carpenter – 2019-05-17T08:48:50.110

Which becomes for 21 bytes: @(x)str2num(x(x~=39)) – Tom Carpenter – 2019-05-17T08:50:03.187

Octal doesn't appear to be working (at least on TIO)... for example, f=("077") returns ans = 77 when it should be 63. Or, as in the test case in OP f=("012345") should return 5349 but instead ans = 12345 – brhfl – 2019-06-03T15:41:12.450

1

Bash, 33 bytes

x=${1//\'};echo $[${x/#0[Bb]/2#}]

TIO

Zsh, 29 27 bytes

-2 bytes thanks to @GammaFunction

<<<$[${${1//\'}/#0[Bb]/2#}]

TIO

Nahuel Fouilleul

Posted 2019-05-16T13:10:02.433

Reputation: 5 582

Clever! I would have thought setopt octalzeroes would be necessary for Zsh. – GammaFunction – 2019-06-15T05:46:39.730

You can save 2 bytes in Zsh with <<<$[...] instead of echo $[...] – GammaFunction – 2019-06-15T05:47:14.720

thanks, i didn't know that zsh empty command with redirection could display output, i don't know much about zsh, i know a lot better bash – Nahuel Fouilleul – 2019-06-15T17:23:33.437

i knew that bash automatically interpret numbers with leading zero to octal, and must be removed for example in date / time – Nahuel Fouilleul – 2019-06-15T17:29:40.530

1

J, 48 bytes

cut@'0x 16b +0b 2b +0 8b0 '''do@rplc~'+',tolower

Try it online!

Eval after string substitution.

0XfF -> +16bff -> 255
0xa'bCd'eF -> +16babcdef -> 11259375
0B1'0 -> +2b10 -> 2
0 -> 8b0 -> 0
01 -> 8b01 -> 1
0'123'4'5 -> 8b012345 -> 5349

FrownyFrog

Posted 2019-05-16T13:10:02.433

Reputation: 3 112

1@GalenIvanov nice find, fixed – FrownyFrog – 2019-05-16T19:35:32.293

1

Retina, 96 bytes

T`'L`_l
\B
:
^
a;
a;0:x:
g;
a;0:b:
2;
a;0:
8;
[a-g]
1$&
T`l`d
+`;(\d+):(\d+)
;$.($`*$1*_$2*
.+;

Try it online! Link includes test suite. Explanation:

T`'L`_l

Delete 's and convert everything to lower case.

\B
:

Separate the digits, as any hex digits need to be converted into decimal.

^
a;
a;0:x:
g;
a;0:b:
2;
a;0:
8;

Identify the base of the number.

[a-g]
1$&
T`l`d

Convert the characters a-g into numbers 10-16.

+`;(\d+):(\d+)
;$.($`*$1*_$2*

Perform base conversion on the list of digits. $.($`*$1*_*$2* is short for $.($`*$1*_*$2*_) which multiplies $` and $1 together and adds $2. ($` is the part of the string before the ; i.e. the base.)

.+;

Delete the base.

Neil

Posted 2019-05-16T13:10:02.433

Reputation: 95 035

I appreciate the literal programming approach you took to explain the code :-) – grooveplex – 2019-05-17T23:12:06.467

1

Perl 6, 29 bytes

{+lc S/^0)>\d/0o/}o{S:g/\'//}

Try it online!

Perl 6 requires an explicit 0o prefix for octal and doesn't support uppercase prefixes like 0X.

Explanation

                   {S:g/\'//}  # remove apostrophes
{                }o  # combine with function
     S/^0)>\d/0o/    # 0o prefix for octal
  lc  # lowercase
 +    # convert to number

nwellnhof

Posted 2019-05-16T13:10:02.433

Reputation: 10 037

0

Go, 75

import "strconv"
func(i string)int64{n,_:=strconv.ParseInt(i,0,0);return n}

vityavv

Posted 2019-05-16T13:10:02.433

Reputation: 734

This doesn't appear to work for binary literals, nor for single-quote digit delimiters. – Nick Matteo – 2019-05-17T18:21:53.607

Oh crap. I'll fix it soon. Completely forgot about the delimiters – vityavv – 2019-05-21T16:17:09.360

0

JavaScript (ES6), 112 bytes

n=>+(n=n.toLowerCase().replace(/'/g,""))?n[1]=="b"?parseInt(n.substr(2),2):parseInt(n,+n[0]?10:n[1]=="x"?16:8):0

Naruyoko

Posted 2019-05-16T13:10:02.433

Reputation: 459

0

Jelly, 27 bytes

ØDiⱮḢ=1aƲȦ
ṣ”';/Ḋ⁾0o;Ɗ¹Ç?ŒV

Try it online!

Almost all of this is handling octal. Feels like it could be better golfed.

Nick Kennedy

Posted 2019-05-16T13:10:02.433

Reputation: 11 829

0

Java, 158 154 bytes

This just waiting to be outgolfed. Just tries regexes until something works and default to hex.
-4 bytes thanks to @ValueInk

n->{n=n.replace("'","");var s=n.split("[bBxX]");return Long.parseLong(s[s.length-1],n.matches("0[bB].+")?2:n.matches("0\\d+")?8:n.matches("\\d+")?10:16);}

Try it online

Using ScriptEngine, 92 87 bytes

Eval train coming through. Technically this is passing the torch to JS, so it's not my main submission.

n->new javax.script.ScriptEngineManager().getEngineByName("js").eval(n.replace("'",""))

TIO

Benjamin Urquhart

Posted 2019-05-16T13:10:02.433

Reputation: 1 262

Use [bBxX] and 0[bB].+ for some quick regex optimizations. – Value Ink – 2019-05-17T00:09:42.533

@ValueInk thanks – Benjamin Urquhart – 2019-05-17T00:15:18.033

That's, not an Integer it's a Long, the title clearly says Integer, a single or double precision IEEE754 could become incorrect due to the method used to save the number when due to the decimal place system in IEEE754 https://en.wikipedia.org/wiki/IEEE_754#Roundings_to_nearest, it also supports a number higher than 2 trillion (0x9999999999)

– Martin Barker – 2019-06-18T01:30:26.287

@MartinBarker it's allowed to use Long instead of Integer for golfing purposes. Also, if you are correct, Python can't compete because it has effectively arbitrary-precision integers. Also, a long in Java is an integer represented with 64 bits instead of 32. There are no decimal places. – Benjamin Urquhart – 2019-06-18T02:10:07.073

The Long thing was just you're using long not an integer and you're wrong about the golfing purposes, The correct output never will exceed 2*10^9 it quite clearly states that meaning that long can't be used on its own because I can give it 0x9999999999 and it will produce a number higher than 2*10^9 whereas C++ it would create a memory overflow issue because your using more than 32 bits on memory when you have allocated only 32 bits of memory to this number – Martin Barker – 2019-06-18T09:55:20.370

@MartinBarker I 100% disagree. Just because my submission can exceed 2*10^9 doesn't mean it's invalid. If it was, all the "normal" int answers would be invalid as well since they go to (2^31)-1, which is larger. – Benjamin Urquhart – 2019-06-18T11:58:40.660

0

Ruby with -n, 17 bytes

Just jumping on the eval train, really.

p eval gsub(?'){}

Try it online!

Value Ink

Posted 2019-05-16T13:10:02.433

Reputation: 10 608

0

Java (JDK), 101 bytes

n->{n=n.replace("'","");return n.matches("0[bB].+")?Long.parseLong(n.substring(2),2):Long.decode(n);}

Try it online!

Long.decode deals with all kinds of literals except the binary ones.

Template borrowed from Benjamin's answer

Shieru Asakoto

Posted 2019-05-16T13:10:02.433

Reputation: 4 445

Nice. I need to look more at the functions primitive wrappers have – Benjamin Urquhart – 2019-05-18T18:00:29.640

0

C (gcc), 120 118 bytes

-1 byte thanks to ceilingcat

f(char*s){int r=0,c=s[1]&31,b=10;for(s+=2*(*s<49&&(b=c^24?c^2?8:2:16)-8);c=*s++;)r=c^39?r*b+(c>57?c%32+9:c-48):r;c=r;}

Try it online!

gastropner

Posted 2019-05-16T13:10:02.433

Reputation: 3 264

0

C (gcc), 101 97 83 bytes

*d;*s;n;f(int*i){for(s=d=i;*d=*s;d+=*s++>39);i=wcstol(i+n,0,n=!i[1]||i[1]&29?0:2);}

Try it online

Johan du Toit

Posted 2019-05-16T13:10:02.433

Reputation: 1 524

0

C (gcc) / Bash / C++, 118 bytes

f(i){asprintf(&i,"echo \"#import<iostream>\nmain(){std::cout<<%s;}\">i.C;g++ i.C;./a.out",i);fgets(i,i,popen(i,"r"));}

Try it online!

Johan du Toit

Posted 2019-05-16T13:10:02.433

Reputation: 1 524

I have golfed some code. Then I have realized there is no reason at all for it to work, but it seems to work; 158 bytes.

– my pronoun is monicareinstate – 2019-06-04T12:52:04.310

@someone, it's nasty, but I like it! – Johan du Toit – 2019-06-04T13:00:13.923

148 bytes by merging popen and system. G++ has a flag, I think -x, to read from stdin. That might be shorter than fopen stuff, but I don't know how to invoke with stdin in C. – my pronoun is monicareinstate – 2019-06-04T13:10:46.417

@someone, Everything is now merged into the popen command – Johan du Toit – 2019-06-04T13:52:36.853

printf -> echo seems to work. You're going to be programming in bash soon. – my pronoun is monicareinstate – 2019-06-04T13:59:02.827

I'm not going down that rabbit hole :) It occurred to me that one can use Python for the inner program, something like this: system("python -c 'print(0b11)'"); – Johan du Toit – 2019-06-04T14:12:42.563

0

PHP - 43 Byte

eval("return ".str_replace($a,"'","").";");

Same method as https://codegolf.stackexchange.com/a/185644/45489

Martin Barker

Posted 2019-05-16T13:10:02.433

Reputation: 413

0

C++, G++, 189 bytes

#include<fstream>
#include<string>
void v(std::string s){{std::ofstream a("a.cpp");a<<"#include<iostream>\nint main(){std::cout<<"<<s<<";}";}system("g++ -std=c++14 a.cpp");system("a.exe");}

No need for tests

Requires installation of g++ with C++14 support

Now, explanations :

It writes a file called a.cpp, uses GCC to compile it and gives a file that output the number

HatsuPointerKun

Posted 2019-05-16T13:10:02.433

Reputation: 1 891

175 bytes – ceilingcat – 2019-06-05T20:13:46.513

0

Pyth, 27 bytes

Jscz\'?&qhJ\0}@J1r\0\7iJ8vJ

Try it online!

Unlike the previous (now deleted) Pyth answer, this one passes all test cases in the question, though it is 3 bytes longer.

randomdude999

Posted 2019-05-16T13:10:02.433

Reputation: 789

Welcome to the site! – Post Rock Garf Hunter – 2019-06-11T16:36:26.610