Simple CSV/DSV importer

12

1

Slightly more than an inverse of this.

In: Multi-line DSV data and a single delimiter character. The DSV may be taken as a file, a filename, line-break separated string, list of strings, etc. All records have the same number of fields, and no field is empty. Data does not contain the delimiter character and there is no quoting or escaping mechanism.

Out: A data structure representing the DSV, e.g. a list of lists of strings or a matrix of strings.

Examples

["here is,some,sample","data,delimited,by commas"] and ",":
[["here is","some","sample"],["data","delimited","by commas"]]

["hello;\"","\";world","\";\""] and ";":
[["hello","\""],["\"","world"],["\"","\""]] (escapes because this example uses JSON)

["to be or not","that is the question"] and " ":
[["to","be","or","not"],["that","is","the","question"]]

Adám

Posted 2017-02-28T19:35:40.317

Reputation: 37 779

So just to clarify, we simply split each item at instances of the given char? – ETHproductions – 2017-02-28T19:42:56.617

@ETHproductions That's right. – Adám – 2017-02-28T19:45:30.177

How should we split the strings if the first or last character is the delimiter? ",for,example,this,string," – G B – 2017-03-01T10:28:45.923

@GB no field is empty – Adám – 2017-03-01T10:35:48.533

So we can assume it won't happen? – G B – 2017-03-01T10:42:08.583

@GB Yes, as stated in the OP. – Adám – 2017-03-01T10:42:40.300

Can the output be a linebreak-separated string too? i.e. would to,be,or,not\nthat,is,the,question be a valid output to your third example? – Aaron – 2017-03-01T10:55:00.550

@Aaron Only if that is the normal way to represent/print lists in your language. – Adám – 2017-03-01T10:56:17.910

@Aaron I was thinking of asking the same, since in sed the natural representation of a list is a collection of lines. As for list of lists, this is less clear, sed has no data types anyway, so to,be,or,not\nthat,is,the,question is I guess reasonable for this challenge. That, or each field on separate lines. – seshoumara – 2017-03-03T21:14:40.447

@seshoumara since the target is CSV, the inner list has to be comma-separated anyway, otherwise I guess any character of the default IFS would have been a good pick even if sed does not care about them. I was asking for ><> which can only output characters and numbers. – Aaron – 2017-03-03T23:31:19.133

Answers

3

Jelly, 3 2 bytes

Dennis points out that while the 2 byte solution appears to not work, the dyadic link itself does, and that it is actually the way command line arguments are parsed that make it look that way.

ṣ€

Try It Online! - footer calls the function with left and right set explicitly, and formats as a grid*.

Exactly as the below, except splits at occurrences of the right argument rather than at sublists equal to the right argument.


œṣ€

The 3 byter - footer displays the result as a grid*.

A dyadic link (function) that takes the DSV list on the left and the delimiter on the right.

How?

œṣ€ - Main link: list l, delimiter d
  € - for each item in l:
œṣ  -     split at occurrences of sublists equal to d

* As a full program the implicit output would simply "smush" together all the characters, so the footer of the TIO link calls the link as a dyad and uses G to format the result nicely.

Jonathan Allan

Posted 2017-02-28T19:35:40.317

Reputation: 67 804

@Okx the implicit output would simply "smush" together all the characters – Adám – 2017-02-28T19:48:38.477

@Okx Yes it is a function that returns a list. The footer is to override the implicit output that occurs when it is run as a full program. – Jonathan Allan – 2017-02-28T19:48:51.950

7

Powershell, 25 22/23 bytes

Two Options, one just calls split on the first arg, using the second arg as a delim value.

$args[0]-split$args[1]

One byte longer, builtin to parse csvs, takes filename as first arg and delim as second.

ipcsv $args[0] $args[1]

-2 because it doesn't require the -Delimiter (-D) param, and will assume it by default.

sadly powershell cannot pass an array of two params, as it will assume they are both files, and will run the command against it twice, no other two-var input method is shorter than this as far as I can see, so this is likely the shortest possible answer.

ipcsv is an alias for Import-Csv, takes a file name as the first unnamed input, and the delim character as the second by default behavior.

Run against the example from the wiki page returns

PS C:\Users\Connor\Desktop> .\csvparse.ps1 'example.csv' ','

Date     Pupil               Grade
----     -----               -----
25 May   Bloggs, Fred        C
25 May   Doe, Jane           B
15 July  Bloggs, Fred        A
15 April Muniz, Alvin "Hank" A

colsw

Posted 2017-02-28T19:35:40.317

Reputation: 3 195

7

Japt, 3 bytes

mqV

Test it online! (Uses the -Q flag to prettyprint the output)

mqV  // Implicit: U, V = inputs
m    // Map each item in U by the following function:
 qV  //   Split the item at instances of V.
     // Implicit: output result of last expression

ETHproductions

Posted 2017-02-28T19:35:40.317

Reputation: 47 880

:O a JSGL beat MATL! – Downgoat – 2017-03-01T03:18:21.470

6

Python, 33 bytes

lambda a,c:[x.split(c)for x in a]

Trelzevir

Posted 2017-02-28T19:35:40.317

Reputation: 987

5

Haskell, 29 bytes

import Data.Lists
map.splitOn

Usage example: (map.splitOn) " " ["to be or not","that is the question"] -> [["to","be","or","not"],["that","is","the","question"]].

nimi

Posted 2017-02-28T19:35:40.317

Reputation: 34 639

4

05AB1E, 5 bytes

vy²¡ˆ

Try it online!

Explanation:

v     For each element in the input array
 y    Push the element
  ²   Push second input
   ¡  Split
    ˆ Add to array

Okx

Posted 2017-02-28T19:35:40.317

Reputation: 15 025

4

JavaScript, 26 bytes

x=>y=>x.map(n=>n.split(y))

Receives input in format (array of strings)(delimiter)

Try it online!

fəˈnɛtɪk

Posted 2017-02-28T19:35:40.317

Reputation: 4 166

4

Mathematica, 11 bytes

StringSplit

Builtin function taking two arguments, a list of strings and a character (and even more general than that). Example usage:

StringSplit[{"to be or not", "that is the question"}, " "]

yields

{{"to", "be", "or", "not"}, {"that", "is", "the", "question"}}

Greg Martin

Posted 2017-02-28T19:35:40.317

Reputation: 13 940

4

MATLAB / Octave, 41 25 bytes

@(x,d)regexp(x,d,'split')

Creates an anonymous function named ans which accepts the first input as a cell array of strings and the second input as a string.

ans({'Hello World', 'How are you'}, ' ')

Try it Online

Suever

Posted 2017-02-28T19:35:40.317

Reputation: 10 257

4

Cheddar, 19 bytes

a->b->a=>@.split(b)

nice demonstration of looping abilities. I added new composition and f.op. blocks so that allows for interesting golfing. (=>:@.split) is supposed to work but it doesn't :(

Downgoat

Posted 2017-02-28T19:35:40.317

Reputation: 27 116

3

MATL, 14 12 4 bytes

H&XX

Try it at MATL Online (the link has a modification at the end to show the dimensionality of the output cell array).

Explanation

        % Implicitly grab the first input as a cell array of strings
        % Implicitly grab the delimiter as a string
H       % Push the number literal 2 to the stack
&XX     % Split the input at each appearance of the delimiter
        % Implicitly display the result

Suever

Posted 2017-02-28T19:35:40.317

Reputation: 10 257

1

CJam, 5 bytes

l~l./

Explanation:

l~     e#Input evaluated (as list)
  l    e#Another line of input
   ./  e#Split first input by second

Roman Gräf

Posted 2017-02-28T19:35:40.317

Reputation: 2 915

1

Ruby using '-n', 17+1 = 18 bytes

p chomp.split *$*

How it works

  • Input from file
  • separator is given as command line parameter
  • since we only have 1 parameter, *$* splats the string and we can use it as a parameter for the split function
  • I tried to avoid chomp but any other solution seems to be longer than this.

G B

Posted 2017-02-28T19:35:40.317

Reputation: 11 099

1

Rebol, 33 bytes

func[b s][map-each n b[split n s]

draegtun

Posted 2017-02-28T19:35:40.317

Reputation: 1 592

1

GNU sed, 48 + 1(r flag) = 49 bytes

1h;1d
:
G
/,$/bp
s:(.)(.*)\n\1:,\2:
t
:p;s:..$::

Try it online!

In sed there are no data types, but a natural representation of a list would be a collection of lines. As such, the input format consists of DSV records each on a separate line, with the delimiter present on the first line.

Explanation: by design, sed runs the script as many times as there are input lines

1h;1d                  # store delimiter, start new cycle
:                      # begin loop
G                      # append saved delimiter
/,$/bp                 # if delimiter is ',', skip replacements and go to printing
s:(.)(.*)\n\1:,\2:     # replace first occurrence of delimiter with ','
t                      # repeat
:p;s:..$::             # print label: delete appended delimiter (implicit printing)

seshoumara

Posted 2017-02-28T19:35:40.317

Reputation: 2 878

1

REXX, 95 bytes

arg f d
do l=1 while lines(f)
    n=linein(f)
    do #=1 while n>''
        parse var n w (d) n
        o.l.#=w
    end
end

Takes a filename and a delimiter as arguments, contents of file are put in stem o.

idrougge

Posted 2017-02-28T19:35:40.317

Reputation: 641

Is all that whitespace really necessary? – Adám – 2017-03-29T08:03:09.210

No, I only indented it for readability. The byte count is for unindented code. – idrougge – 2017-03-29T09:03:41.333

Which flavour of REXX is this? – Adám – 2017-03-29T09:39:44.940

I think it's pure ANSI REXX. I've only tested it with Regina. – idrougge – 2017-03-29T11:02:52.630

1On its way to TIO... – Adám – 2017-03-29T11:04:05.467

0

APL (Dyalog), 4 bytes

In versions up to and including 15.0, this needs ⎕ML←3 which is default by many. From version 16.0 can just be replaced by for the same effect.

Takes separator as left argument and DSV as right argument.

≠⊂¨⊢

Try it online!

 the inequalities (of the left argument and the right argument)

⊂¨ partition each

 right argument

By partition is mean to remove all elements indicated by a corresponding zero in the left argument, and begin a new partition whenever a the corresponding number in the left argument is greater than its predecessor, i.e. on every one if the left argument is Boolean, as is the case here.

Adám

Posted 2017-02-28T19:35:40.317

Reputation: 37 779

0

R, 8 bytes (2 ways)

R has two builtin functions that meet the requirements of this challenge:

strsplit

takes a vector of strings and a separator, and returns a list of vectors of the separated strings.

read.csv

takes a file name and a separator, and returns a data frame. Technically this might be 10 bytes because it needs the option header=F so it won't read the first elements as the column names. Currently the TIO link reads from stdin.

Try these online!

Giuseppe

Posted 2017-02-28T19:35:40.317

Reputation: 21 077