Shortest program to split a string at non-digits without RegExps

16

3

EDIT: If you're using Lisp, I have given some guidelines at the bottom in counting bytes.

Objective: Make the shortest function that splits a string at non-digits and returns an array consisting of only digits in each string, without the use of any regular expressions. Leading zeroes are to be included in each string.

Current Standings (separated in categories):

  • C/C++/C#/Java: 68 (C) ....
  • GolfScript/APL/J: 13 (APL)
  • All others: 17 (Bash, uses tr), 24 (Ruby)

Rules:

(I apologize for the lengthiness)

  1. The format must be as a function with a single string argument. Up to two additional arguments may be added if necessary for the proper return of the array (e.g. sh/csh/DOS Batch needs an extra variable reference to return, etc.).
  2. The primary function declaration doesn't count, and nor does importing other standard libraries. `#include`s, `import`s, and `using`s don't count. Everything else does. This does include `#define`s and helper functions. Sorry for the confusion. Refer to this as a helpful guide as to what does/does not count (written in C-style syntax)
    // doesn't count toward total, may be omitted unless
    // non-obvious, like half of Java's standard library.
    #include <stdio.h>
    
    import some.builtin.Class // doesn't count, see above
    
    #define printf p // counts towards total
    
    /* Any other preprocessor directives, etc. count. */
    
    int i = 0; // counts
    
    someFunction(); // counts
    
    char[][] myMainSplitFunction(char[][] array) { // doesn't count
      // Everything in here counts
      return returnArray; // Even this counts.
    } // doesn't count
    
    /* Everything in here counts, including the declaration */
    char[][] someHelperFunction(char[] string) {
      // stuff
    } // even this counts
    
  3. The output must be a string array or similar (Array lists in Java and similar are acceptable). Examples of accepted output: String[], char[][], Array, List, and Array (object).
  4. The array must contain only contain variable-length string primitives or string objects. No empty strings should be present in the return, with the exception below. Note: the strings are to contain a string of consecutive matches, such as the example input and output below.
  5. If there are no matches, then the function body should return null, an empty array/list, or an array/list containing an empty string.
  6. No external libraries allowed.
  7. DOS line endings count as one byte, not two (already covered in meta, but needs emphasized)
  8. And the biggest rule here: no regular expressions allowed.

This is a question, so smallest size wins. Good luck!

And here are some example inputs and outputs (with C-style escapes):

Input:  "abc123def456"
Output: ["123", "456"]

Input:  "aitew034snk582:3c"
Output: ["034", "582", "3"]

Input:  "as5493tax54\\430-52@g9.fc"
Output: ["5493", "54", "430", "52", "9"]

Input:  "sasprs]tore\"re\\forz"
Output: null, [], [""], or similar

Please put how many bytes used by your answers, and as always, happy golfing!


Guidelines for Lisp

Here's what does and doesn't count in Lisp dialects:

;;; Option 1

(defun extract-strings (a b) ; Doesn't count
  (stuff) ;;; Everything in here counts
) ; Doesn't count

;;; Option 2

(defun extract-strings (string &aux (start 0) (end 0)) ; Doesn't count
  (stuff) ;;; Everything in here counts
) ; Doesn't count.
All other lambdas fully count towards the byte count.

Isiah Meadows

Posted 2014-02-23T05:54:25.110

Reputation: 1 546

Wasn't this asked before? – Ismael Miguel – 2014-02-23T06:02:28.203

1Yes, but I re-asked it on Meta and made substantial edits to it before posting it again here. Because of this, it shouldn't be classified as a duplicate (the other related one should be closed if not already). – Isiah Meadows – 2014-02-23T06:06:23.420

@IsmaelMiguel It's been deleted.

– Justin – 2014-02-23T06:37:28.223

2Shouldn't your "golf" be posted as an answer? – MrWhite – 2014-02-23T10:57:55.813

4Sorry, but -1 for disallowing GolfScript. All languages should be allowed. – Doorknob – 2014-02-23T18:09:30.650

1

@Doorknob That's true, but I also understand the OP's feelings. People should have a chance to compete even if they don't speak GolfScript, J, or APL (and I'm guilty of perusing the latter in these competitions.) Can you give a look at my proposal in the thread he linked to?

– Tobia – 2014-02-23T19:45:43.887

I did edit the rules a little because I missed the other two (J i intentionally overlooked, but I forgot about APL). I am going to have two separate scoring levels for those three and any others. – Isiah Meadows – 2014-02-24T15:54:40.950

Comment added by request of @maf-soft: the c# 66 solution was invalid: it returns all digits as char-array – Glenn Randers-Pehrson – 2014-10-21T16:46:55.893

Answers

10

APL, 13 chars

(or 28 / 30 bytes, read below)

{⍵⊂⍨⍵∊∊⍕¨⍳10}

I see you've banned GolfScript from your question. I understand your sentiment, but I hope this community won't eventually ban APL, because it's a truly remarkable programming language with a long history, not to mention a lot of fun to code in. Maybe it could just be scored differently, if people feel it's competing unfairly. I'll post my thoughts on this matter to that thread you've linked.

On that same token, I've always added a footnote to my APL posts, claiming that APL could be scored as 1 char = 1 byte. My claim rests on the fact that a few (mostly commercial) APL implementations still support their own legacy single-byte encoding, with the APL symbols mapped to the upper 128 byte values. But maybe this is too much of a stretch, in which case you may want to score this entry as 28 bytes in UTF-16 or 30 bytes in UTF-8.

Explanation

{        ⍳10}  make an array of naturals from 1 to 10
       ⍕¨      convert each number into a string
      ∊        concatenate the strings into one (it doesn't matter that there are two 1s)
    ⍵∊         test which chars from the argument are contained in the digit string
 ⍵⊂⍨           use it to perform a partitioned enclose, which splits the string as needed

Examples

      {⍵⊂⍨⍵∊∊⍕¨⍳10} 'ab5c0x'
 5  0 
      {⍵⊂⍨⍵∊∊⍕¨⍳10}  'z526ks4f.;8]\p'
 526  4  8 

The default output format for an array of strings does not make it clear how many strings are there in the array, nor how many blanks. But a quick manipulation to add quotes should make it clear enough:

      {q,⍵,q←'"'}¨ {⍵⊂⍨⍵∊∊⍕¨⍳10} 'ab5c0x'
 "5"  "0" 
      {q,⍵,q←'"'}¨ {⍵⊂⍨⍵∊∊⍕¨⍳10}  'z526ks4f.;8]\p'
 "526"  "4"  "8" 

Tobia

Posted 2014-02-23T05:54:25.110

Reputation: 5 455

You should mention that this is only APL2 (or proper ⎕ML for APL*PLUS/Dyalog APL), and so it doesn't work on TryAPL.org.

– Adám – 2016-05-17T05:25:50.477

Regarding your comment, I think that for other languages compete fairly with "shorthand" ones one should count each symbol in the other languages as one char. For example, my Mathematica solution posted here should be counted as 7 (more or less). Designing a language with compressed tokens is not merit at all, I think. – Dr. belisarius – 2014-02-23T21:13:55.853

Could you provide a hex dump of your golf? I can't read some of the characters. – Isiah Meadows – 2014-02-24T12:22:16.403

@impinball How would the hexdump help you? It's not like you would see what is being done. – mniip – 2014-02-24T12:27:41.953

@impinball the APL code is {omega enclose commute omega epsilon epsilon format each iota 10}. If you need the unicode values you can just copy and paste it to any online tool, even if you can't see the characters (which is strange, as most modern Unicode fonts have the APL symbols) In any case what you get is this {\u2375\u2282\u2368\u2375\u220a\u220a\u2355\u00a8\u237310} (mind the last "10" which is not part of the escape sequence)

– Tobia – 2014-02-24T15:24:24.313

For some reason, it was not rendering on my Andriod with both Dolphin and Chrome, and on IE on this computer, the majority aren't rendering, either. But thank you, anyways. – Isiah Meadows – 2014-02-24T15:52:44.450

I am changing the rules a bit to actually score them separately. – Isiah Meadows – 2014-02-24T15:57:55.827

One of the less ⍨-ish APL solutions! – James Wood – 2014-02-24T20:46:34.660

Before seeing your program I couldn't fully appreciate Dijkstra's quote about "APL [being] a mistake, carried through to perfection." (link). Thank you for an eye-opener ;-)

– dasblinkenlight – 2014-02-24T21:50:28.657

@dasblinkenlight You're welcome. I of course disagree with the mistake part. If you wish to peruse my posting history on this site, you'll find many APL posts, almost always followed by an explanation of the algorithm. The most recent golf I won was with 4 chars, for an admittedly non-trivial question :-)

– Tobia – 2014-02-24T22:08:44.103

@Tobia OMG! After seeing your four-character winner I no longer sure that APL was a mistake - I am beginning to sense a malicious intent :) :) :) – dasblinkenlight – 2014-02-24T22:20:58.583

@dasblinkenlight Yes, world domination. We're working on it. APL has been one of the preferred languages in the financial sector for many decades. DUM DUM DUM… – Tobia – 2014-02-24T22:25:50.790

1Instead of ∊⍕¨⍳10, couldn't you just use ⎕D? That should be the constant '0123456789'. Dyalog APL at the very least supports it, and so does NARS2000. – marinus – 2014-11-11T12:25:58.147

@marinus Yes I could. I didn't know it when I wrote this entry, thank you. – Tobia – 2014-11-11T15:21:43.303

5

Python 47

Implementation

f=lambda s:"".join([' ',e][e.isdigit()]for e in s).split()

Demo

>>> sample=["abc123def456","aitew034snk582:3c","as5493tax54\\430-52@g9.fc","sasprs]tore\"re\\forz"]
>>> [f(data) for data in sample]
[['123', '456'], ['034', '582', '3'], ['5493', '54', '430', '52', '9'], []]

Algorithm

Convert each non-digit character to space and then split the resultant string. A simple and clear approach.

And a fun solution with itertools (71 characters)

f1=lambda s:[''.join(v)for k,v in __import__("itertools").groupby(s,key=str.isdigit)][::2]

Abhijit

Posted 2014-02-23T05:54:25.110

Reputation: 2 841

4

Ruby, 70

f=->(s){s.chars.chunk{|c|c.to_i.to_s==c}.select{|e|e[0]}.transpose[1]}

Online Version for testing

Since converting any non-digit character to an int returns 0 in Ruby (with to_i), converting every char to int and back to char is the non-regex way to check for a digit...

David Herrmann

Posted 2014-02-23T05:54:25.110

Reputation: 1 544

You can also do a ('0'..'9').member? for every char, but what you did is shorter already – fgp – 2014-02-23T22:50:28.770

You are definitely right - I should have said: "a" way ;) – David Herrmann – 2014-02-24T08:25:59.937

4

bash, 26 (function contents: 22 + array assignment overhead 4)

This isn't going to beat the other bash answer, but its interesting because it might make you double-take:

f()(echo ${1//+([!0-9])/ })

Usage is:

$ a=(`f "ab5c0x"`); echo ${a[@]}
5 0
$ a=(`f "z526ks4f.;8]\p"`); echo ${a[@]}
526 4 8
$ 

At the first quick glance, //+([!0-9])/ looks a lot like a regexp substitution, but it isn't. It is a bash parameter expansion, which follows pattern-matching rules, instead of regular expression rules.

Returning true bash array types from bash functions is a pain, so I chose to return a space-delimited list instead, then convert to an array in an array assignment outside of the function call. So in the interests of fairness, I feel the (` `) around the function call should be included in my score.

Digital Trauma

Posted 2014-02-23T05:54:25.110

Reputation: 64 644

3

Mathematica 32

StringCases[#,DigitCharacter..]&

Usage

inps ={"abc123def456", "aitew034snk582:3c", "as5493tax54\\430-52@g9.fc", 
        "sasprs]tore\"re\\forz"}  
StringCases[#,DigitCharacter..]&/@inps

{{"123", "456"}, 
 {"034", "582", "3"}, 
 {"5493", "54", "430", "52", "9"}, 
 {}
}

The equivalent using regexes is much longer!:

StringCases[#, RegularExpression["[0-9]+"]] &

Dr. belisarius

Posted 2014-02-23T05:54:25.110

Reputation: 5 345

Mathematica sucks at regex. – CalculatorFeline – 2016-04-27T19:26:22.633

3

Bash, 21 bytes 17/21 bytes (improved by DigitalTrauma)

Building a space-separated list with tr

function split() {
tr -c 0-9 \ <<E
$1
E
}

replaces any non digit by a space

Usage

$ for N in $(split 'abc123def456'); do echo $N; done
123
456

Edit

as pointed by the comments below, the code can be stripped down to 17 bytes:

function split() (tr -c 0-9 \ <<<$1)

and as the result is not stricly speaking a Bash array, the usage should be

a=(`split "abc123def456"`); echo ${a[@]}

and the extra (``) should be counted

Coaumdio

Posted 2014-02-23T05:54:25.110

Reputation: 141

1Gah you beat me to it! But why not use a here-string instead of a here-document? Also you can save a newline at the end of the function content you use (blah) instead of {blah;}: split()(tr -c 0-9 \ <<<$1). That way your function body is only 17 chars. – Digital Trauma – 2014-02-24T18:25:05.653

1Your function returns a "space-separated list" instead of an array. Certainly returning true arrays from bash function is awkward, but you could at least assign the result of your function to an array in your usage: a=($(split "12 3a bc123")); echo ${a[@]}. It could be argued that "($())" be counted in your score – Digital Trauma – 2014-02-24T18:29:08.347

Before exploring the tr approach, I tried doing this with a parameter expansion. tr is definitely the better approach for golfing purposes.

– Digital Trauma – 2014-02-24T18:45:47.940

Have you tried surrounding the tr with the expansion operator? It would come out to something like ($(tr...)), and where the function declaration doesn't count, the outer parentheses wouldn't count against you. It would only be the command substitution part. – Isiah Meadows – 2014-02-25T05:40:38.847

I don't see how this should be working, but I'm not fluent in Bash arrays though. Anyway, the (``) construct is 1-char better than the ($()) one and shall be prefered. – Coaumdio – 2014-02-25T09:23:42.387

2

Smalltalk (Smalltalk/X), 81

f := [:s|s asCollectionOfSubCollectionsSeparatedByAnyForWhich:[:ch|ch isDigit not]]

f value:'abc123def456' -> OrderedCollection('123' '456')

f value:'aitew034snk582:3c' -> OrderedCollection('034' '582' '3')

f value:'as5493tax54\430-52@g9.fc' -> OrderedCollection('5493' '54' '430' '52' '9')

f value:'sasprs]tore\"re\forz' -> OrderedCollection()

sigh - Smalltalk has a tendency to use veeeery long function names...

blabla999

Posted 2014-02-23T05:54:25.110

Reputation: 1 869

asCollectionOfSubCollectionsSeparatedByAnyForWhich ಠ_ಠ This name is too long – TuxCrafting – 2016-06-19T21:02:29.257

2Is that a function name? o__O – Tobia – 2014-02-23T19:39:10.903

@tobia Apparently... – Isiah Meadows – 2014-02-25T05:49:53.050

1

Perl, 53

Edit: on no matches, sub now returns list with empty string (instead of empty list) as required.

It also avoids splitting on single space character, as it triggers 'split on any white-space' behavior, which probably violates the rules. I could use / / delimiter, which would split on single space, but paradoxically it would look like using regexp pattern. I could use unpack at the cost of some extra characters and so get rid of split controversy altogether, but I think that, what I finish with, splitting on a literal character (other than space) is OK.

sub f{shift if(@_=split a,pop=~y/0-9/a/csr)[0]eq''and$#_;@_}

And, no, Perl's transliteration operator doesn't do regular expressions. I can unroll 0-9 range to 0123456789 if that's the problem.

user2846289

Posted 2014-02-23T05:54:25.110

Reputation: 1 541

As long as it doesn't use regular expressions, it's valid. – Isiah Meadows – 2014-02-23T08:39:57.387

My Perl is not so strong. If I understand the code, you are replacing non-digits with a specific non-digit, then splitting on that chosen non-digit, then filtering out empty strings. Is this a correct reading? – Tim Seguine – 2014-02-23T12:02:46.883

1@TimSeguine: Not exactly. Non-digits are replaced and squashed to a single character, splitting on which produces empty string if that delimiter happens to be at the beginning. It is then shifted away if list contains other entries. – user2846289 – 2014-02-23T12:12:16.730

Enpty list is okay. – Isiah Meadows – 2014-02-24T16:04:18.530

1

VBScript, 190 (164 without function declaration)

Function f(i)
For x=1 To Len(i)
c=Mid(i,x,1)
If Not IsNumeric(c) Then
Mid(i,x,1)=" "
End If
Next
Do
l=Len(i)
i=Replace(i,"  "," ")
l=l-Len(i)
Loop Until l=0
f=Split(Trim(i)," ")
End Function

While not competitive at all, I'm surprised that VBScript comes out this short on this given how verbose it is (13 bytes for the CRs alone). It loops through the string, replacing any non-numeric characters with spaces, then reduces all the whitespace to single spaces, and then uses a space delimiter to divide it.

Test cases

Input: "ab5c0x"
Output: 5,0

Input: "z526ks4f.;8]\p"
Output: 526,4,8

Comintern

Posted 2014-02-23T05:54:25.110

Reputation: 3 632

DOS line endings count as one character as far as I've read on meta. – Isiah Meadows – 2014-02-24T16:00:07.523

I suggested an edit for you. – Isiah Meadows – 2014-02-24T16:02:32.630

The count already assumes Linux style 1 byte line endings. I get 190 characters by my count (just verified again). – Comintern – 2014-02-24T18:38:52.080

Ok. I must have miscounted. – Isiah Meadows – 2014-02-25T05:17:33.457

1

R, 81

f=function(x){
s=strsplit(x,"",T)[[1]]
i=s%in%0:9
split(s,c(0,cumsum(!!diff(i))))[c(i[1],!i[1])]
}

The function accepts a string and returns a list of strings.

Examples:

> f("abc123def456")
$`1`
[1] "1" "2" "3"

$`3`
[1] "4" "5" "6"

-

> f("aitew034snk582:3c")
$`1`
[1] "0" "3" "4"

$`3`
[1] "5" "8" "2"

$`5`
[1] "3"

-

> f("as5493tax54\\430-52@g9.fc")
$`1`
[1] "5" "4" "9" "3"

$`3`
[1] "5" "4"

$`5`
[1] "4" "3" "0"

$`7`
[1] "5" "2"

$`9`
[1] "9"

-

> f("sasprs]tore\"re\\forz")
$<NA>
NULL

Note: $x is the name of the list element.

Sven Hohenstein

Posted 2014-02-23T05:54:25.110

Reputation: 2 464

1

C, 68 bytes (only the function's body)

void split (char *s, char **a) {
int c=1;for(;*s;s++)if(isdigit(*s))c?*a++=s:0,c=0;else*s=0,c=1;*a=0;
}

The first argument is the input string, the second one is the output array, which is a NULL-terminated string array. Sufficient memory must be reserved for a before calling the function (worst case: sizeof(char*)*((strlen(s)+1)/2)).

The input string is modified by the function (every non-digit character is replaced by '\0')

Usage example

#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>

void split (char *s, char **a) {
int c=1;for(;*s;s++)if(isdigit(*s))c?*a++=s:0,c=0;else*s=0,c=1;*a=0;
}   

void dump(char **t) {
    printf("[ ");for(;*t;t++)printf("%s ", *t);printf("]\n");
}   

int main() {
    char **r = malloc(1024);
    char test1[] = "abc123def456";
    char test2[] = "aitew034snk582:3c";
    char test3[] = "as5493tax54\\430-52@g9.fc";
    char test4[] = "sasprs]tore\"re\\forz";
    split(test1,r); 
    dump(r);
    split(test2,r); 
    dump(r);
    split(test3,r); 
    dump(r);
    split(test4,r); 
    dump(r);
    return 0;
}

Output

[ 123 456 ]
[ 034 582 3 ]
[ 5493 54 430 52 9 ]
[ ]

Un-golfed version:

void split (char *s, char **a) {
    int c=1; // boolean: the latest examined character is not a digit
    for(;*s;s++) {
        if(isdigit(*s)) {
            if(c) *a++ = s; // stores the address of the beginning of a digit sequence
            c=0;
        } else {
            *s=0; // NULL-terminate the digit sequence
            c=1;
        }   
    }   
    *a = 0; // NULL-terminate the result array
} 

Coaumdio

Posted 2014-02-23T05:54:25.110

Reputation: 141

1

Common Lisp (1 according to the letter; &approx;173 according to the spirit)

Here's a readable version. The byte count is fairly high because of the long names in things like digit-char-p and position-if and vector-push-extend.

(defun extract-numeric-substrings (string &aux (start 0) (end 0) (result (make-array 0 :adjustable t :fill-pointer 0)))
  (loop 
     (unless (and end (setq start (position-if #'digit-char-p string :start end)))
       (return result))
     (setq end (position-if (complement #'digit-char-p) string :start (1+ start)))
     (vector-push-extend (subseq string start end) result)))
(extract-numeric-substrings "abc123def456")
#("123" "456")

(extract-numeric-substrings "aitew034snk582:3c")
#("034" "582" "3")

(extract-numeric-substrings "as5493tax54\\430-52@g9.fc")
#("5493" "54" "430" "52" "9")

(extract-numeric-substrings "sasprs]tore\"re\\forz")
#()

The concept of "function declaration" is sort of vague. Here's a version that only has one byte (the character x in the function body); everything else is bundled in to the auxiliary variables of the function's lamba list (part of the function's declaration):

(defun extract-numeric-substrings (string 
                                   &aux (start 0) (end 0) 
                                   (result (make-array 0 :adjustable t :fill-pointer 0))
                                   (x (loop 
                                         (unless (and end (setq start (position-if #'digit-char-p string :start end)))
                                           (return result))
                                         (setq end (position-if (complement #'digit-char-p) string :start (1+ start)))
                                         (vector-push-extend (subseq string start end) result))))
  x)

The actual byte count will depend on how many of auxiliary declarations would have to be moved into the body for this to be deemed acceptable. Some local function renaming would help, too (e.g., shorten position-if since it appears twice, use single letter variables, etc.).

This rendering of the program has 220 characters:

(LOOP(UNLESS(AND END(SETQ START(POSITION-IF #'DIGIT-CHAR-P STRING :START END)))(RETURN RESULT))(SETQ END(POSITION-IF(COMPLEMENT #'DIGIT-CHAR-P)STRING :START(1+ START)))(VECTOR-PUSH-EXTEND(SUBSEQ STRING START END)RESULT))

If nothing else, this should promote Common Lisp's &aux variables.

This can be written more concisely with loop, of course:

(defun extract-numeric-substrings (s &aux (b 0) (e 0) (r (make-array 0 :fill-pointer 0)))
  (loop 
     with d = #'digit-char-p 
     while (and e (setq b (position-if d s :start e)))
     finally (return r)
     do 
       (setq e (position-if-not d s :start (1+ b)))
       (vector-push-extend (subseq s b e) r)))

The loop form, with extra space removed, has 173 characters:

(LOOP WITH D = #'DIGIT-CHAR-P WHILE(AND E(SETQ B(POSITION-IF D S :START E)))FINALLY(RETURN R)DO(SETQ E(POSITION-IF-NOT D S :START(1+ B)))(VECTOR-PUSH-EXTEND(SUBSEQ S B E)R))

Joshua Taylor

Posted 2014-02-23T05:54:25.110

Reputation: 660

I would count starting from (result on to the final parenthesis to be the body. The part that defines the name and parameters are the declaration. – Isiah Meadows – 2014-02-25T04:33:55.413

Please refer to rule 2 on my amended rules to see what I'm really talking about in a function declaration (basically, declare function name, parameters, and if syntactically required, which is rare among interpreted languages, the return type). – Isiah Meadows – 2014-02-25T05:23:28.180

@impinball Yeah, the "1" count is sort of a joke, but the important part here is that result is declared as a parameter here; it just has a very non-trivial initialization form. It's the same thing, in principle, as an optional argument with a default value that's computed by some complex expression. (In simpler cases, it's easy to imagine something like char* substring( char *str, int begin, int end(0) ) in some language with a C-like syntax to specify that end is optional and that if it's not provided, then its value is 0. I'm just highlighting the fact that some of these terms – Joshua Taylor – 2014-02-25T15:11:35.920

@impinball aren't quite concrete and language agnostic enough to prevent some trollish byte counts. :) – Joshua Taylor – 2014-02-25T15:12:07.517

The first part that isn't specifying parameters is where I would stat counting (e.g. (defun fn (string &aux (start 0) (end 0) wouldn't count, but everything remaining in the lambda would). – Isiah Meadows – 2014-02-25T16:09:27.837

@impinball Everything in the lambda list after &aux is an auxiliary parameter. The value for start is 0, the value for end is 0, the value for result is (make-array ...), and the value for x is (loop ...). They're all in the lambda list of the function; the newlines aren't significant here (except as whitespace). When parameter default values accept arbitrarily complex initialization forms, it becomes problematic to allow exclusion of parameter lists from byte count. – Joshua Taylor – 2014-02-25T16:11:30.440

@impinball Even with all that bit about the &aux variables, though, I've posted a more direct version with loop that has a more "according to the spirit" byte count. – Joshua Taylor – 2014-02-25T16:13:13.150

0

php, 204

function s($x){$a=str_split($x);$c=-1;$o=array();
for($i= 0;$i<count($a);$i++){if(ord($a[$i])>=48&&ord($a[$i])<=57)
{$c++;$o[$c]=array();}while(ord($a[$i])>=48&&ord($a[$i])<=57)
{array_push($o[$c],$a[$i]);$i++;}}return $o;}

Descriptive Code:

function splitdigits($input){

    $arr = str_split($input);
    $count = -1;
    $output = array();
    for($i = 0; $i < count($arr); $i++){


    if(ord($arr[$i]) >= 48 && ord($arr[$i]) <= 57){
        $count++;
        $output[$count] = array();
    }

    while(ord($arr[$i]) >= 48 && ord($arr[$i]) <= 57){
        array_push($output[$count], $arr[$i]);
        $i++;
    } 

}

return $output;
}

This is pretty long code and I'm sure there will be a much shorter php version for this code golf. This is what I could come up with in php.

palerdot

Posted 2014-02-23T05:54:25.110

Reputation: 269

there are some improvements: you can replace array() with [], array_push($output[$count], $arr[$i]); with $output[$count][]=$arr[$i];, and the ord() checks with is_numeric(). and you don't even need to split the string to iterate over its characters. also, only the inner code of the function counts, so as it is you char count is 204. – Einacio – 2014-02-24T16:45:39.157

The function declaration doesn't count. Refer to rule 2 as a guide on what counts and what doesn't. – Isiah Meadows – 2014-02-25T05:19:27.913

0

C, 158

#define p printf
char s[100],c;int z,i;int main(){while(c=getchar())s[z++]=(c>47&&c<58)*c;p("[");for(;i<z;i++)if(s[i]){p("\"");while(s[i])p("%c",s[i++]);p("\",");}p("]");}

Since C doesnt have array print functions built-in I had to do that work on my own so I apologive that there is a final comma in every output. Essentially what that code does is it reads the string if it is not a digit it replaces it with '\0' and then I just loop through the code and print out all of the chains of digits.(EOF=0)

Input: ab5c0x
Output: ["5","0",]

Input: z526ks4f.;8]\p
Output: ["526","4","8",]

ASKASK

Posted 2014-02-23T05:54:25.110

Reputation: 291

According to the question's rules (rule 2), you only have to count the characters in the function body. So your solution would actually be less than 170 bytes. I'm not sure if the count includes variable prototypes outside the function body, though. – grovesNL – 2014-02-24T08:43:15.050

I will amend the rules on this: #defines, variable declarations, etc. will count, but the function declaration will not. – Isiah Meadows – 2014-02-24T15:19:59.700

Also, last time I checked, there was a type in C notated as char[][] which is legal. If you return as that (or char**), you will be fine. – Isiah Meadows – 2014-02-25T04:25:36.430

It doesn't have To be text output? I though the program was supposed to output the array in a string format – ASKASK – 2014-02-25T20:54:51.590

0

C#, 98

static string[] SplitAtNonDigits(string s)
{
    return new string(s.Select(c=>47<c&c<58?c:',').ToArray()).Split(new[]{','},(StringSplitOptions)1);
}

First, this uses the LINQ .Select() extension method to turn all non-digits into commas. string.Replace() would be preferable, since it returns a string rather than a IEnumerable<char>, but string.Replace() can only take a single char or string and can't make use of a predicate like char.IsDigit() or 47<c&c<58.

As mentioned, .Select() applied to a string returns an IEnumerable<char>, so we need to turn it back into a string by turning it into an array and passing the array into the string constructor.

Finally, we split the string at commas using string.Split(). (StringSplitOptions)1 is a shorter way of saying StringSplitOptions.RemoveEmptyEntries, which will automatically takes care of multiple consecutive commas and commas at the start/end of the string.

BenM

Posted 2014-02-23T05:54:25.110

Reputation: 1 409

1Instead of char.IsDigit(c), you can use '/'<c&&c<':' – grovesNL – 2014-02-24T09:07:58.173

1Good point...or even better, 47<c&&c<58. (Frankly, I'm surprised it works with numbers, but apparently it does). – BenM – 2014-02-24T20:17:16.527

1And I can save an extra valuable character by using a single '&' instead of a double '&&'. In C#, this still logical AND when both operands are booleans -- it only does a bitwise AND when they're integers. – BenM – 2014-02-24T20:31:58.400

Nice one. I didn't know it was able to do that. – grovesNL – 2014-02-24T21:49:29.810

A slightly shorter variant is to split on white space instead of ,, and then manually remove the empty items return new string(s.Select(c=>47<c&c<58?c:' ').ToArray()).Split().Where(a=>a!="").ToArray(); – VisualMelon – 2014-10-21T15:51:59.007

0

Python 104 83

def f(s, o=[], c=""):
    for i in s:
        try:int(i);c+=i
        except:o+=[c];c=""
    return [i for i in o+[c] if i]

@Abhijit answer is far clever, this is just a "minified" version of what i had in mind.

assert f("abc123def456") == ["123", "456"]
assert f("aitew034snk582:3c") == ["034", "582", "3"]
assert f("as5493tax54\\430-52@g9.fc") == ["5493", "54", "430", "52", "9"]
assert f("sasprs]tore\"re\\forz") == []

This yields no output, so the code is working, if ran one by one, as some variables are defined at the declaration.

gcq

Posted 2014-02-23T05:54:25.110

Reputation: 251

You don't have to count the function declaration, if you did. Just as a heads up – Isiah Meadows – 2014-02-25T05:24:31.793

0

JavaScript, 240 bytes

And for those of you who are curious, here's my probably huge golf:

function split(a) { // begin function
function f(c){for(var a=-1,d=9;d--;){var e=c.indexOf(d+"");0
>e||e<a&&(a=e)}return 0<a?a:null}var c=f(a);if(null==c)retur
n null;var d=[];for(i=0;;){a=a.substring(c);d[i]||(d[i]="");
c=f(a);if(null==c)break;d[i]+=a.charAt(c);0<c&&i++}return d;
} // end function

Above in pretty print:

function split(a) {
    function f(c) {
        for (var a = -1, d = 9;d--;) {
            var e = c.indexOf(d + "");
            0 > e || e < a && (a = e);
        }
        return 0 < a ? a : null;
    }
    var c = f(a);
    if (null == c) return null;
    var d = [];
    for (i = 0;;) {
        a = a.substring(c);
        d[i] || (d[i] = "");
        c = f(a);
        if (null == c) break;
        d[i] += a.charAt(c);
        0 < c && i++;
    }
    return d;
}

Above in normal descriptive code

function split(a) {
    function findLoop(string) {
        var lowest = -1;
        var i = 9;
        while (i--) {
            var index = string.indexOf(i + '');
            if (index < 0) continue;
            if (index < lowest) lowest = index;
        }
        return (lowest > 0) ? lowest : null;
    }
    var index = findLoop(a);
    if (index == null) return null;
    var ret = [];
    i = 0;
    for ( ; ; ) {
        a = a.substring(index);
        if (!ret[i]) ret[i] = '';
        index = findLoop(a);
        if (index == null) break;
        ret[i] += a.charAt(index);
        if (index > 0) i++;
    }
    return ret;
}

Isiah Meadows

Posted 2014-02-23T05:54:25.110

Reputation: 1 546

0

PHP 134

function f($a){
$i=0;while($i<strlen($a)){!is_numeric($a[$i])&&$a[$i]='-';$i++;}return array_filter(explode('-',$a),function($v){return!empty($v);});
}

Einacio

Posted 2014-02-23T05:54:25.110

Reputation: 436

You can shorten it by leaving out the callback at array_filter. This will automatically remove all entries which are false when they're casted to booleans. – kelunik – 2014-04-20T15:54:33.873

@kelunik that would filter out 0s as well – Einacio – 2014-04-21T11:35:35.633

0

Ruby, 24

f=->s{s.tr("
-/:-~",' ').split}

Defines digits using negative space within the printable ascii range.

histocrat

Posted 2014-02-23T05:54:25.110

Reputation: 20 600

Function declaration doesn't count. – Isiah Meadows – 2014-02-25T04:22:42.950

0

JS/Node : 168 162 147 138 Chars

function n(s){
var r=[];s.split('').reduce(function(p,c){if(!isNaN(parseInt(c))){if(p)r.push([]);r[r.length-1].push(c);return 0;}return 1;},1);return r;
}

Beautified version:

function n(s) {
  var r = [];
  s.split('').reduce(function (p, c) {
    if (!isNaN(parseInt(c))) {
      if (p) {
        r.push([]);
      }
      r[r.length - 1].push(c);
      return 0;
    }
    return 1;
  }, 1);
  return r;
}

palanik

Posted 2014-02-23T05:54:25.110

Reputation: 111

This question only wants the array returned, so you can remove console.log(r) and some other things – Not that Charles – 2014-02-24T19:47:18.033

The function declaration doesn't count toward the score (reason is to help level the playing field) – Isiah Meadows – 2014-02-25T04:23:44.110

Ok. Adjusted the score as per @impinball's comment. (Actually there are two functions declared here. Char count includes the anonymous function) – palanik – 2014-02-25T04:55:11.803

It should. I updated the rules to help explain it better. – Isiah Meadows – 2014-02-25T05:06:35.200

Meanwhile, came up with something better... – palanik – 2014-02-25T05:11:38.777

0

JavaScript, 104 97 89

Golfed:

Edit: When the loops walks off the end of the array, c is undefined, which is falsy and terminates the loop.

2/27: Using ?: saves the wordiness of if/else.

function nums(s) {
s+=l='length';r=[''];for(k=i=0;c=s[i];i++)r[k]+=+c+1?c:r[k+=!!r[k][l]]='';
r[l]--;return r
}

The carriage return in the body is for readability and is not part of the solution.

Ungolfed:

The idea is to append each character to the last entry in the array if it is a digit and to ensure the last array entry is a string otherwise.

function nums(s) {
    var i, e, r, c, k;
    k = 0;
    s+='x'; // ensure the input does not end with a digit
    r=[''];
    for (i=0;i<s.length;i++) {
        c=s[i];
        if (+c+1) { // if the current character is a digit, append it to the last entry
            r[k] += c;
        }
        else { // otherwise, add a new entry if the last entry is not blank
            k+=!!r[k].length;
            r[k] = '';
        }
    }
    r.length--; // strip the last entry, known to be blank
    return r;
}

DocMax

Posted 2014-02-23T05:54:25.110

Reputation: 704

0

PHP 98 89

As in DigitalTrauma's bash answer, this doesn't use a regex.

function f($x) {
// Only the following line counts:
for($h=$i=0;sscanf(substr("a$x",$h+=$i),"%[^0-9]%[0-9]%n",$j,$s,$i)>1;)$a[]=$s;return@$a;
}

Test cases:

php > echo json_encode(f("abc123def456")), "\n";
["123","456"]
php > echo json_encode(f("aitew034snk582:3c")), "\n";
["034","582","3"]
php > echo json_encode(f("as5493tax54\\430-52@g9.fc")), "\n";
["5493","54","430","52","9"]
php > echo json_encode(f("sasprs]tore\"re\\forz")), "\n";
null

PleaseStand

Posted 2014-02-23T05:54:25.110

Reputation: 5 369

0

Python

def find_digits(_input_):
    a,b = [], ""
    for i in list(_input_):
        if i.isdigit(): b += i
        else:
            if b != "": a.append(b)
            b = ""
    if b != "": a.append(b)
    return a

I left StackExchange

Posted 2014-02-23T05:54:25.110

Reputation: 159

0

Haskell 31

{-# LANGUAGE OverloadedStrings #-}
import Data.Char (isDigit)
import Data.Text (split)

f=filter(/="").split(not.isDigit)

It splits the string on all non-numeric characters and removes the empty strings generated by consecutive delimiters.

lortabac

Posted 2014-02-23T05:54:25.110

Reputation: 761

0

Javascript, 72

function f(a){
 a+=".",b="",c=[];for(i in a)b=+a[i]+1?b+a[i]:b?(c.push(b),""):b;return c
}

Ungolfed

a+=".",b="",c=[];        //add '.' to input so we dont have to check if it ends in a digit
for(i in a)
    b=+a[i]+1?           //check if digit, add to string if it is
        b+a[i]:         
    b?                   //if it wasnt a digit and b contains digits push it
        (c.push(b),""):  //into the array c and clear b
    b;                   //else give me b back
return c

Sample input/output

console.log(f("abc123def456"));
console.log(f("aitew034snk582:3c"));
console.log(f("as5493tax54\\430-52@g9.fc"));
console.log(f("sasprs]tore\"re\\forz"));

["123", "456"]
["034", "582", "3"]
["5493", "54", "430", "52", "9"]
[] 

JSFiddle

Danny

Posted 2014-02-23T05:54:25.110

Reputation: 1 563

1I like it! Much simpler than my own. You can drop another 8 characters by replacing if(+a[i]+1)b+=a[i];else if(b)c.push(b),b="" with b=+a[i]+1?b+a[i]:b?(c.push(b),""):b. – DocMax – 2014-02-27T19:07:10.977

@DocMax thx, I edited to include your suggestion :). That (c.push(b),"") seemed clever, never seen that. – Danny – 2014-02-27T20:35:59.557

I had forgotten about it until I saw it used extensively earlier today in http://codegolf.stackexchange.com/questions/22268#22279

– DocMax – 2014-02-27T21:00:39.223

That's not valid, ' ' is mistaken for 0 and it's a javascript quirk difficult to manage. Try '12 34 56' – edc65 – 2014-10-21T21:22:18.407

0

VBA 210, 181 without function declaration

Function t(s)
Dim o()
For Each c In Split(StrConv(s,64),Chr(0))
d=IsNumeric(c)
If b And d Then
n=n&c
ElseIf d Then:ReDim Preserve o(l):b=1:n=c
ElseIf b Then:b=0:o(l)=n:l=l+1:End If:Next:t=o
End Function

Gaffi

Posted 2014-02-23T05:54:25.110

Reputation: 3 411

0

Rebol (66 chars)

remove-each n s: split s complement charset"0123456789"[empty? n]s

Ungolfed and wrapped in function declaration:

f: func [s] [
    remove-each n s: split s complement charset "0123456789" [empty? n]
    s
]

Example code in Rebol console:

>> f "abc123def456"
== ["123" "456"]

>> f "aitew035snk582:3c"
== ["035" "582" "3"]

>> f "as5493tax54\\430-52@g9.fc"
== ["5493" "54" "430" "52" "9"]

>> f {sasprs]torer"re\\forz}
== []

draegtun

Posted 2014-02-23T05:54:25.110

Reputation: 1 592

0

R 52

This function splits strings by character class (this is not regex! :)) class is N - numeric characters and P{N} means negation of this class. o=T means omit empty substrings.

x
## [1] "wNEKbS0q7hAXRVCF6I4S" "DpqW50YfaDMURB8micYd" "gwSuYstMGi8H7gDAoHJu"
require(stringi)
stri_split_charclass(x,"\\P{N}",o=T)
## [[1]]
## [1] "0" "7" "6" "4"

## [[2]]
## [1] "50" "8" 

## [[3]]
## [1] "8" "7"

bartektartanus

Posted 2014-02-23T05:54:25.110

Reputation: 131

0

PHP 99

<?php

$a = function($s) {
foreach(str_split($s)as$c)$b[]=is_numeric($c)?$c:".";return array_filter(explode('.',implode($b)));
};

var_dump($a("abc123def456"));
var_dump($a("aitew034snk582:3c"));
var_dump($a("as5493tax54\\430-52@g9.fc"));
var_dump($a("sasprs]tore\"re\\forz"));


Output

array(2) {
  [3]=>
  string(3) "123"
  [6]=>
  string(3) "456"
}
array(3) {
  [5]=>
  string(3) "034"
  [8]=>
  string(3) "582"
  [9]=>
  string(1) "3"
}
array(5) {
  [2]=>
  string(4) "5493"
  [5]=>
  string(2) "54"
  [6]=>
  string(3) "430"
  [7]=>
  string(2) "52"
  [9]=>
  string(1) "9"
}
array(0) {
}

kelunik

Posted 2014-02-23T05:54:25.110

Reputation: 160

0

JavaScript 88

88 chars when not counting function n(x){}

function n(x){
y=[],i=0,z=t=''
while(z=x[i++])t=!isNaN(z)?t+z:t&&y.push(t)?'':t
if(t)y.push(t)
return y
}

wolfhammer

Posted 2014-02-23T05:54:25.110

Reputation: 1 219

0

Racket 149

Golfed (according to the rules, only the length of the second line counts)

(define (extract-numeric-substrings s)
(let L([x(map string->number(string-split s""))])(set! x(dropf x false?))(if(null? x)x(let-values([(h t)(splitf-at x number?)])(cons(apply ~a h)(L t)
)))))

Ungolfed

(define (extract-numeric-substrings s)
  (let L ([x (map string->number (string-split s ""))])
    (set! x (dropf x false?))
    (if (null? x)
        x
        (let-values([(h t) (splitf-at x number?)])
          (cons (apply ~a h) (L t))))))

Results

(map extract-numeric-substrings '("abc123def456" 
"aitew034snk582:3c" 
"as5493tax54\\430-52@g9.fc" 
"sasprs]tore\"re\\forz"))

'(("123" "456")
("034" "582" "3")
("5493" "54" "430" "52" "9")
())

Matthew Butterick

Posted 2014-02-23T05:54:25.110

Reputation: 401

0

Javascript with lambdas, 84

extract=s=>[].reduce.call(s,(r,c)=>r+=(c!=+c?r[r.length-1]==" "?"":" ":c),"").trim().split(" ")

Tested in Firefox 27

Qwertiy

Posted 2014-02-23T05:54:25.110

Reputation: 2 697

0

VB.NET, 87

Imports System.Console
Imports System.String
Imports System.Char
Imports System.StringSplitOptions

Module All
  Public Function Parse(S As String) As String()
    Return Join("",From C In S Select If(IsDigit(C),C," ")).Split({" "},RemoveEmptyEntries)
  End Function

  Public Sub Main()
    For Each S As String In {"abc123def456", "aitew034snk582:3c", "as5493tax54\\430-52@g9.fc", "sasprs]tore\""re\\forz"}
      WriteLine(Join(":", Parse(S)))
    Next S
  End Sub
End Module

According the rules, only content of parse function is calculated:

Return Join("",From C In S Select If(IsDigit(C),C," ")).Split({" "},RemoveEmptyEntries)

Qwertiy

Posted 2014-02-23T05:54:25.110

Reputation: 2 697

0

Haskell 37

I am not counting the imports towards the bytecount, as per the scoring rules. The body of the code is a 1 liner (with semicolons, so there is actually no advantage other than looking more golfed), though I am not sure what to count and what not to count in the function definitions. This was originally longer, but I got inspiration from Coaumdio's and DigitalTrauma's bash solutions.

import Data.Char {-Not counted-}
i x|isDigit x=x;i _=' ';f=words.map i {-Fully Counted-}

i preserves digit characters, and replaces all other characters with space. f just maps i onto a string and then applies words, which splits a string into a list of substrings that were separated by arbitrarily long runs of whitespace. If I am missing any of the rules whereby I can reduce my bytecount by omitting certain characters from the score, I would appreciate comments to that effect.

archaephyrryx

Posted 2014-02-23T05:54:25.110

Reputation: 1 035

0

APL: 22

{(a/1,2</a)⊂⍵/⍨a←⍵∊⎕D}

Explanation:

a←⍵∊⎕D creates boolean of argument (⍵) being a digit (system variable ⎕D contains '0123456789')

⍵/⍨a  takes just the numeric part of the argument

(a/1,2</a)⊂  makes substrings of numbers only, starting at first number found

Moris Zucca

Posted 2014-02-23T05:54:25.110

Reputation: 1 519

0

Pharo Smalltalk, 60

f:=[:s|(s splitOn:[:e|e isDigit not])reject:[:e|e isEmpty]]

Outputs:

f value:'abc123def456' -> OrderedCollection('123' '456')  
f value:'aitew034snk582:3c' -> OrderedCollection('034' '582' '3')  
f value:'as5493tax54\430-52@g9.fc' -> OrderedCollection('5493' '54' '430' '52' '9')  
f value:'sasprs]tore\"re\forz' -> OrderedCollection()

MartinW

Posted 2014-02-23T05:54:25.110

Reputation: 151

-1

C# 66

static char[] n(string s){return s.Where(Char.IsDigit).ToArray();}

PauloHDSousa

Posted 2014-02-23T05:54:25.110

Reputation: 119

This is not a valid solution, I don't think the OP wants you to split on the empty string ;) – VisualMelon – 2014-10-21T15:40:27.000