Generalized integer casting in Python

8

Background

I have a string in Python which I want to convert to an integer. Normally, I would just use int:

>>> int("123")
123

Unfortunately, this method is not very robust, as it only accepts strings that match -?[0-9]+ (after removing any leading or trailing whitespace). For example, it can't handle input with a decimal point:

>>> int("123.45")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '123.45'

And it certainly can't handle this:

>>> int("123abc?!")

On the other hand, exactly this behavior can be had without any fuss in Perl, PHP, and even the humble QBasic:

INT(VAL("123abc"))   ' 123

Question

Here's my shortest effort at this "generalized int" in Python. It's 50 bytes, assuming that the original string is in s and the result should end up in i:

n="";i=0
for c in s:
 n+=c
 try:i=int(n)
 except:0

Fairly straightforward, but the try/except bit is ugly and long. Is there any way to shorten it?

Details

Answers need to do all of the following:

  • Start with a string in s; end with its integer value in i.
  • The integer is the first run of digits in the string. Everything after that is ignored, including other digits if they come after non-digits.
  • Leading zeros in the input are valid.
  • Any string that does not start with a valid integer has a value of 0.

The following features are preferred, though not required:

  • A single - sign immediately before the digits makes the integer negative.
  • Ignores whitespace before and after the number.
  • Works equally well in Python 2 or 3.

(Note: my code above meets all of these criteria.)

Test cases

"0123"   -> 123
"123abc" -> 123
"123.45" -> 123
"abc123" -> 0
"-123"   -> -123 (or 0 if negatives not handled)
"-1-2"   -> -1 (or 0 if negatives not handled)
"--1"    -> 0
""       -> 0

DLosc

Posted 2015-05-27T02:50:08.973

Reputation: 21 213

Somewhat related: http://codegolf.stackexchange.com/questions/28783/implement-an-integer-parser (but there it was explicitly stated that input would be properly-formed integers).

– DLosc – 2015-05-27T02:57:34.263

1What should "12abc3" give? – orlp – 2015-05-27T06:10:02.863

@orlp 12--it's analogous to the "123.45" case. – DLosc – 2015-05-27T06:38:12.847

(lambda(x)(or(parse-integer x :junk-allowed t)0)) (Common Lisp, 49 bytes) -- Only posted as a comment since it is built-in. – coredump – 2015-05-27T11:28:19.397

1@coredump :junk-allowed--ha, that's great! I would have made this a general golf challenge, were it not for the fact that the answer in many languages is trivial. But thanks for the Lisp. :^) – DLosc – 2015-05-27T14:24:00.243

Answers

4

40 bytes

import re;i=int("0"+re.split("\D",s)[0])

and you can do negatives for 8 characters more:

import re;i=int((re.findall("^-?\d+",s)+[0])[0])

KSab

Posted 2015-05-27T02:50:08.973

Reputation: 5 984

@DLosc Ah you're right, didn't test the second one well enough apparently. The 'aha' moment was when I realized some python regex functions return strings not MatchObjects – KSab – 2015-05-27T07:00:54.870

1import re;i=int((re.findall("^-?\d+",s)+[0])[0]) works, for 48 bytes. – DLosc – 2015-05-27T14:36:01.153

6

Python 2, 47, 46

It's not as short as using regex, but I thought it was entertainingly obscure.

i=int(('0%sx'%s)[:~len(s.lstrip(str(1<<68)))])

-1 due to KSab – str with some large integer works better than the repr operator since it does not put an L on the end.

feersum

Posted 2015-05-27T02:50:08.973

Reputation: 29 566

2you can shave off a byte by using str(1<<68) inside the lstrip – KSab – 2015-05-27T08:28:24.950

Wow. Entertainingly obscure is right! (This only handles nonnegative numbers, correct?) – DLosc – 2015-05-27T14:41:39.337

Another bonus of @KSab's suggestion is Python 3 compatibility. – DLosc – 2015-05-27T14:42:30.390