How to split CSV files as per number of rows specified?

I've CSV file (around 10,000 rows ; each row having 300 columns) stored on LINUX server. I want to break this CSV file into 500 CSV files of 20 records each. (Each having same CSV header as present in original CSV)

Is there any linux command to help this conversion?

Pawan Mude

Posted 2013-12-21T16:33:28.887

Reputation: 165

Answers

For the sake of completeness, here are some minor improvements:

You could save the header once and reuse many times
You could insert the header in the split files using sed without temporary files

Like this:

header=$(head -n 1 file.csv)
tail -n +2 file.csv | split -l 20
for file in x??; do
    sed -i -e 1i$'\\\n'"$header" "$file"
done

The $'\\\n' there is a NEWLINE character escaped with a backslash. The sed expression means: insert $header before the 1st line.

janos

Posted 2013-12-21T16:33:28.887

Reputation: 2 449

This should do it without the CSV header:

tail -n +2 file.csv | split -l 20

You can then add the header to each of the files:

for file in x*
do
    (head -n 1 file.csv; cat "$file") > "$file".new
    mv "$file".new "$file" # Stolen from @PawanMude's answer
done

l0b0

Posted 2013-12-21T16:33:28.887

Reputation: 6 306

Try:

fn="infile" c=0
{ 
  read header
  split -a 3 -l 3 - "$fn"
  for f in "$fn"???; do
    c=$((c+1))
    printf "%s\n" "$header" | cat - "$f" > "${f%???}-$c" && rm "$f"
  done 
} < $fn

Or try with awk:

awk 'NR==1{h=$0; next} !((NR-2)%n){close(f); f=FILENAME "-" ++c; print h>f}{print>f}' n=3 infile

multi-line version:

awk '
  NR==1 {
    h=$0
    next
  }
  !((NR-2)%n) {
    close(f)
    f=FILENAME "-" ++c
    print h>f
  }
  {
    print>f
  }
' n=3 infile

Scrutinizer

Posted 2013-12-21T16:33:28.887

Reputation: 249

Use GNU Parallel:

cat bigfile.csv | parallel -N20 --header : --pipe 'cat > {#}'

If you need to run a command on each of the parts, then GNU Parallel can help do that, too:

cat bigfile.csv | parallel -N20 --header : --pipe my_program_reading_from_stdin

cat bigfile.csv | parallel -N20 --header : --pipe --cat my_program_reading_from_a_file {}

Ole Tange

Posted 2013-12-21T16:33:28.887

Reputation: 3 034

Best Way to solve this using POST mentioned below :

Solution

    tail -n +2 file.txt | split -l 4 - split_
for file in split_*
do
    head -n 1 file.txt > tmp_file
    cat $file >> tmp_file
    mv -f tmp_file $file
done

Pawan Mude

Posted 2013-12-21T16:33:28.887

Reputation: 165