-6

Write a bullet-proof "Atof"

Atof (Array to float) is a function that every language has to have. (K&R: Atof in chapter 4.2 and strtod() in appendix B.5.) It gets one float from one string.

For the purposes of this challenge, the number formats you must recognise are:

[spaces][-|+][digits].digits
[spaces][-|+]digits[.digits]
[spaces][-|+]digits[.]

Where [stuff] means that stuff may appear or not.

Input

The challenge is to write an Atof function that takes the following two arguments:

The start index of the first character to be included in the float. This may be 0 or 1 indexed but must be consistent with the output.
The string to parse for a float. This will be printable ASCII only.

Output

You must always returns two numbers (any fitting datatype will do):

The float retrieved from the string. A whole number may optionally be returned as integer.
The index where the function stopped parsing the string to get the float, i.e. one beyond the index of the float's last character. If the float terminates the string, this will be beyond the string. Though an index is an integer, this may optionally be returned as a float. This may be 0 or 1 indexed but must be consistent with the input.

Bullet-proofing

Since your code must be bullet-proof, you must handle all the following error conditions:

Overflow: For languages with fixed precision, this means that the found float's integer part cannot fit in your language's float type. If your language has arbitrary precision, overflow means that the found float had more than 3000 digits in the integer part. If you language has more than one float type, you may chose a target type.
An input string that does not contain a valid float beginning at the start index.
A start index outside the bounds of the string.
The two arguments are not of type string (for the string) and number (for the index)

In all these cases you must still return two values which must always be the same. They must also be recognisable as indicating an error, so no valid inputs may cause the same output as the error-indicating output. Here are some examples of valid error indicators in the format float index:

0 "E"
"" ""
-1 -1
404 0.5
0+1i 0+1i
"none" "error"
0J1 0J1

Test cases

1-indexed, and using -1 -1 as error indicator:

1 ".123" → 0.123 5
1 "3." → 3 3
1 "123/a" → 123 4
1 "12212.8989" → 12212.8989 11
3 " 12212.8989999999999922222222222222222222222" → 12212.899 47
9 "123456789012345.89" → 9012345.89 19
1 " 123456789012345.89" → 1.23456789E14 22
1 " 1234567890123459999.89" → -1 -1
1 " 123.." → 123 7
1 " -123.." → -123 7
1 " -123.999" → -123.999 10
1 "a123" → -1 -1
1 " a123" → -1 -1
0 "123" → -1 -1
4 "123" → -1 -1
3 9000 → -1 -1
"123" 1 → -1 -1
"123" "29393" → -1 -1
1 " a123.." → -1 -1

Example output from an APL implementation, using 1-based indexing and 0J1 0J1 (the complex variable i) for the error

  Atf 1 '123'
123 4 
  Atf 1 '.123'
0.123 5 
  Atf 1 '3.'
3 3 
  Atf 1 '12212.8989'
12212.8989 11 
  Atf 3 '      12212.8989999999999922222222222222222222222'
12212.899 50 
  Atf 9 '123456789012345.89'
9012345.89 19 
  Atf 1 '   123456789012345.89'
1.23456789E14 22 
  Atf 1 '12345678901234567.89'
0J1 0J1 
  Atf 1 '1234567890123456.89'
1.23456789E15 20 
  Atf 1 ' -123..'
¯123 7 
  Atf 1 ' -.999'
¯0.999 7 
  Atf 1 ' -1.'
¯1 5 
  Atf 1 '1.'
1 3 
  Atf 1 ' -9'
¯9 4 
  Atf 1 'a123'
0J1 0J1 
  Atf 0 '123'
0J1 0J1 
  Atf 4 '123'
0J1 0J1 
  Atf 3 9000
0J1 0J1 
  Atf '123' 1
0J1 0J1 
  Atf '123' '29393'
0J1 0J1 
  Atf '123', ⊂2 3⍴⍳10
0J1 0J1 
  Atf 1 '   a123..'
0J1 0J1 
  Atf 1 '.'
0J1 0J1 
  Atf 1 ''
0J1 0J1 
  Atf 1 '-'
0J1 0J1 
  Atf 1 '+.'
0J1 0J1 
  Atf 1 '+.3'
0.3 4 
  Atf 1 '-.3'
¯0.3 4 
  Atf 1 '*.3'
0J1 0J1 
  Atf 1 ' +'
0J1 0J1 
  Atf 1 ' +4.'
4 5 
  Atf 1 ' +4.23'
4.23 7 
  Atf 1 '+0.'
0 4 
  Atf 1 '-0.'
0 4
  Atf 1 '-+4.23'
0J1 0J1 
  Atf 1 '-+.23'
0J1 0J1 
  Atf 1 '-+1'
0J1 0J1

Winning

This is code-golf and the winner will be the one who can write that function in the fewest bytes.

RosLuP

Posted 2018-01-25T17:32:24.540

Reputation: 3 036

"has to have" sounds pessimal English to me. Was it stated ipsis verbis like that in the book? – sergiol – 2018-01-25T17:50:49.080

@DLosc "abc123" has to return -1 +1 – RosLuP – 2018-01-25T19:24:34.393

@DLosc Atof 1 "abc123" has to return error : -1 -1 – RosLuP – 2018-01-25T19:40:09.333

you must handle all the following error conditions: [...] The two arguments are not of type string (for the string) and number (for the index) How should strictly typed languages handle this requirement? – Laikoni – 2018-01-25T19:54:16.413

Does the submission have to be a function, or can it be a full program? And if output can go to stdout, what determines "returning two values"? (For instance: for my error output, I print -1 and claim that this represents the two values - and 1.) – DLosc – 2018-01-25T20:22:19.870

What's keeping you from using something like NaN to indicate an error? The examples you give aren't even all floats... – Sanchises – 2018-01-25T20:50:35.330

@Laikoni for languages with strict type sys has nothing to do for arguments types...in fact in the type languages I seen: Function not compile if type of argument of the function not fit with function prototype – RosLuP – 2018-01-25T21:00:02.503

@DLosc it has to be a function with 2 argument in input (int string) 2 argument out (int or float + int) – RosLuP – 2018-01-25T21:04:26.263

@Sanchises Nan is enough for the error case but the float number 1.23 is not enough because one need too the index where Atof end its read – RosLuP – 2018-01-25T21:09:29.880

What? I'm asking why you suggest "none" "none" as an error output (which aren't even floats) but do not suggest something like NaN 0 which would be way more logical. Just a suggestion. – Sanchises – 2018-01-25T21:22:03.487

@Sanchises is ok NAN 0 in case of error – RosLuP – 2018-01-25T22:07:34.107

Why doesn't 1 " a123.." → 9 123 ? – Adám – 2018-01-25T22:09:00.590

@Adám because read the space read a, a is not a digit so it stop and return error – RosLuP – 2018-01-25T22:14:24.407

3"The index where the function stopped parsing the string to get the float, i.e. one beyond the index of the float's last character" you are asking to write strtod, not atof. – Matteo Italia – 2018-01-25T22:33:22.883

@MatteoItalia Ah, that's why it references strtod() in appendix B.5. I had been wondering. Thanks! – Adám – 2018-01-25T22:45:51.677

9@RosLuP You shouldn't drastically change the spec for no reason (adding +) five hours after posting the challenge. It doesn't matter that the spec isn't in accordance with the book: For the purposes of this challenge, the number formats (...) are means that the book is only an inspiration and not a specification. – Adám – 2018-01-26T00:42:14.477

Answers

Perl 5, 115 101 94 bytes

sub f{($p,$_)=@_;/^.{$p} *([+-]?\d*\.?\d*)/g;""eq$1||$p=~/\D/?(E,E):($1=~s/^\D*\K\./0./r,pos)}

Try it online!

Xcali

Posted 2018-01-25T17:32:24.540

Reputation: 7 671

Doesn't quite handle numbers like 3. according to the spec: the result (0-indexed) for input "3. " 0 should be 3 2, but yours gives 3 1 because it doesn't parse the decimal point as part of the number. – DLosc – 2018-01-25T20:24:42.717

Also, OP has just clarified that the submission has to be a function, not a full program. :^/ – DLosc – 2018-01-25T21:10:28.620

And then changed it again to add a plus sign to the spec. I've updated the code to handle all of that. – Xcali – 2018-01-25T22:57:04.650

APL (Dyalog Unicode), 53 bytes

Anonymous infix function. Takes start-index as left argument and input-string as right argument. 0-indexed. Returns two empty numeric lists for errors.

{0::⍬⍬⋄(⍎,≢+1⍳⍨⍷∘⍵)⊃' *-?(\d+\.?\d*|\.\d+)'⎕S'&'⊢⍺↓⍵⊣÷0≤⍺}

Try it online!

{...} anonymous lambda; ⍺ is left argument (start-index), ⍵ is right argument (input-string)

0:: if any error happens:

⍬⍬ return two empty lists

⋄ now try:

0≤⍺ Boolean (zero or one) whether the start-index is non-negative

÷ reciprocal (errors on zero, i.e negative start-index)

⍵⊣ yield the input-string

⍺↓ drop start-index characters (errors if start-index is non-integer)

⊢ yield that (separates the string on the left from the string on the right)

' *-?(\d+\.?\d*|\.\d+)'⎕S'&' return matches of the PCRE Search (errors on non-strings):
* zero or more spaces
-? an optional minus
(...|...) any one of the following two:
\d+\.?\d* one or more digits, a period, zero or more digits
\.\d+ a period, one or more digits

⊃ pick the first match

(...) apply the following tacit function:

⍷∘⍵ mask for where the match begins in the input string

1⍳⍨ index of the first occurrence

≢+ add to the length of the match

⍎, append to the evaluated match

Adám

Posted 2018-01-25T17:32:24.540

Reputation: 37 779

It is not bullet proof '-.' return nothing instead of error – RosLuP – 2018-01-26T11:37:43.560

@RosLuP Not so. It returns two empty lists ⍬⍬: Try it online!

– Adám – 2018-01-26T12:33:35.220

Pip, 85 90 bytes

{v<+aQa<#b&(SSb@>a~` *([-+]?)(\d*)\.?(\d*)`v)GE0&$2<=2**53?RV+V*[a+$);J[$1$2|0'.$3|0]];^v}

This is a function that takes the index and string and returns a 2-element list containing the parsed float and the stop index. Under error conditions, it returns the list ["-"; 1]. Try it online!

DLosc

Posted 2018-01-25T17:32:24.540

Reputation: 21 213

This seems good... Pass all manual test – RosLuP – 2018-01-26T11:46:21.847

It is strange that this "-?(\d+.?\d|.\d+)" is in common to 3 or 4? answer – RosLuP – 2018-01-26T21:47:22.430

JavaScript (ES6), 160 bytes

f=
(n,s)=>typeof n=="number"&&typeof s=="string"&&(m=s.slice(n).match(/ *[-+]?(\d*(\.?)\d*)/))&&m[1][l=m[2].length]&&!m[1][l+16]&&1/m[0]?[+m[0],n+m[0].length]:[,,]

<div oninput=o.textContent=f(+n.value,s.value)><input type=number min=0 id=n><input id=s><pre id=o>

Returns a pair of undefined values on error. Note that the snippet doesn't test the type constraints.

Neil

Posted 2018-01-25T17:32:24.540

Reputation: 95 035

-3

APL NARS, 1432 bytes, 716 chars

Type←{v←⍴⍴⍵⋄v>2:'Tensor ',⍕v⋄v=2:'Matrix'⋄(⍵≡∊⍵)∧(v=1)∧''≡0↑⍵:'Str'⋄(v=0)∧''≡0↑⍵:'Chr'⋄v=1:'List'⋄⍵≢+⍵:'Complex or Quaternion or Oction'⋄⍵=⌈⍵:'Int'⋄'Float'}
r←Atf aa;a;c;v;p;pw;s;sign;i;len;vx
vx←0⋄r←0J1 0J1⋄→Z×⍳1<⍴⍴aa⋄→Z×⍳2≠⍴aa⋄i←⊃1⊃aa⋄a←⊃2⊃aa⋄len←⍴a⋄→Z×⍳'Int'≢Type i⋄→Z×⍳(i>len)∨i<1⋄s←Type a⋄→Z×⍳(s≢'Str')∧s≢'Chr'⋄→Z×⍳(''≡a)∨'.'≡a⋄→AA×⍳∼s≡'Chr'⋄a←,a
AA:pw←1⋄→B
A:i+←1
B:→Z×⍳(i=len)∧' '=i⊃a⋄→A×⍳(i<len)∧' '=i⊃a
sign←1⋄→C×⍳∼'-'=i⊃a⋄sign←¯1⋄i+←1⋄→D
C:→D×⍳∼'+'=i⊃a⋄i+←1
D:→Z×⍳i>len⋄c←v←0⋄→G
F:v←p+10×v⋄vx←1⋄i+←1⋄→Z×⍳v>1e16⋄→Y×⍳i>len
G:p←¯1+⎕D⍳i⊃a⋄→F×⍳(0≤p)∧p≤9⋄→X×⍳∼'.'=i⊃a⋄i+←1⋄→L
H:→H1×⍳c>16⋄v←p+10×v⋄vx←1⋄pw×←10⋄c+←1
H1:i+←1
L:→X×⍳i>len⋄p←¯1+⎕D⍳i⊃a⋄→H×⍳(0≤p)∧p≤9
X:→Z×⍳0=vx
Y:r←(sign×v)÷pw⋄r←r,i
Z:

ungolf comment and test:

r←Atof aa;a;c;v;p;pw;s;sign;i;len;vx
⍝ Riferimento K&R pag 69 70
⍝ Input una lista di 2 elementi, il primo elemento e' l'indice
⍝ l'altro e' la stringa dove prendere i caratteri
⍝ Ritorna una lista di 2 elementi, uno e' il valore preso(oppure 0J1 0J1 per errore)  
⍝ il secondo e' il valore dell'indice dove e' arrivato
    vx←0⋄r←0J1 0J1⋄→Z×⍳1<⍴⍴aa⋄→Z×⍳2≠⍴aa
    i←⊃1⊃aa⋄a←⊃2⊃aa⋄len←⍴a
    ⍝Check on i or ⊃1⊃aa
    →Z×⍳'Int'≢Type i⋄→Z×⍳(i>len)∨i<1
    ⍝Check on a or ⊃2⊃aa
    s←Type a⋄→Z×⍳(s≢'Str')∧s≢'Chr'⋄→Z×⍳(''≡a)∨'.'≡a
    →AA×⍳∼s≡'Chr'⋄a←,a   ⍝ Se e' un carattere lo trasforma in stringa
AA: pw←1⋄→B
    ⍝ Leva gli spazi
A:    i+←1
B:    →Z×⍳(i=len)∧' '=i⊃a⋄→A×⍳(i<len)∧' '=i⊃a
    ⍝ Prende il segno
    sign←1⋄→C×⍳∼'-'=i⊃a⋄sign←¯1⋄i+←1⋄→D
C:  →D×⍳∼'+'=i⊃a⋄i+←1
D:  →Z×⍳i>len
    c←v←0⋄→G
    ⍝ Prende la parte intera
F:    v←p+10×v⋄vx←1⋄i+←1⋄c+←1
      →Z×⍳v>1e16 ⍝ Overflow > 1e16 nella parte intera entra in overflow
      →Y×⍳i>len  ⍝ End Number
G:    p←¯1+⎕D⍳i⊃a 
      →F×⍳(0≤p)∧p≤9
    ⍝ Prende il punto
    →X×⍳∼'.'=i⊃a⋄i+←1⋄→L
    ⍝ Prende la parte frazionaria
H:       →H1×⍳c>16⋄v←p+10×v⋄vx←1⋄pw×←10⋄c+←1 ⍝if c>16 not get the digit
H1:      i+←1
L:       →X×⍳i>len⋄p←¯1+⎕D⍳i⊃a⋄→H×⍳(0≤p)∧p≤9
X:  →Z×⍳0=vx
Y:  r←(sign×v)÷pw⋄r←r,i
Z:

  Atf 1 '123'
123 4 
  Atf 1 '.123'
0.123 5 
  Atf 1 '3.'
3 3 
  Atf 1 '12212.8989'
12212.8989 11 
  Atf 3 '      12212.8989999999999922222222222222222222222'
12212.899 50 
  Atf 9 '123456789012345.89'
9012345.89 19 
  Atf 1 '   123456789012345.89'
1.23456789E14 22 
  Atf 1 '12345678901234567.89'
0J1 0J1 
  Atf 1 '1234567890123456.89'
1234567890123457 20 
  Atf 1 ' -123..'
¯123 7 
  Atf 1 ' -.999'
¯0.999 7 
  Atf 1 ' -1.'
¯1 5 
  Atf 1 '1.'
1 3 
  Atf 1 ' -9'
¯9 4 
  Atf 1 'a123'
0J1 0J1 
  Atf 0 '123'
0J1 0J1 
  Atf 4 '123'
0J1 0J1 
  Atf 3 9000
0J1 0J1 
  Atf '123' 1
0J1 0J1 
  Atf '123' '29393'
0J1 0J1 
  Atf '123', ⊂2 3⍴⍳10
0J1 0J1 
  Atf 1 '   a123..'
0J1 0J1 
  Atf 1 '.'
0J1 0J1 
  Atf 1 ''
0J1 0J1 
  Atf 1 '-'
0J1 0J1 
  Atf 1 '+.'
0J1 0J1 
  Atf 1 '+.3'
0.3 4 
  Atf 1 '-.3'
¯0.3 4 
  Atf 1 '*.3'
0J1 0J1 
  Atf 1 ' +'
0J1 0J1 
  Atf 1 ' +4.'
4 5 
  Atf 1 ' +4.23'
4.23 7   
  Atf 1 '+0.'
0 4 
  Atf 1 '-0.'
0 4 
  Atf 1 '0.0'
0 4 
  Atf 1 '-+4.23'
0J1 0J1 
  Atf 1 '+-1'
0J1 0J1

RosLuP

Posted 2018-01-25T17:32:24.540

Reputation: 3 036

1That's not really any better. You should wait more than 24 hours, and please comment in English. (All of the comments.) – MD XF – 2018-01-27T01:13:34.847

That is better for definition... Someone will find one bug in the hll interpreter for the expression ([-+]?)(\d).?(\d*)`v)GE0 but can not not find one bug in the simple loop... (other that 000000000000000001 will overflow above) – RosLuP – 2018-01-27T05:55:04.273