Splitting a large txt file every 100 lines and including the original header (on a Mac)

1

2

I am looking for a tool or script (Textwrangler or Terminal) that can split a larger text file every 100 lines counting from line 5 (the first 4 are header lines) and output individual .txt files which include the original header.

For instance

input:

File.txt
line1 / line4   HEADER
...
line5 / line265 DATA

output:

File_01.txt
line1/line4   HEADER
line5/line104 DATA

File_02.txt
line1/line4   HEADER
line5/line104 DATA

File_03.txt
line1/line4   HEADER
line5/line65  DATA

The text file uses Windows line breaks (CR LF) in case that matters.

I am currently doing this manually so any suggestions that can make this process more efficient are very welcome.

Dan

Posted 2010-09-14T09:21:21.070

Reputation: 98

Answers

5

  1. Remove the header and put it into a separate file header.txt.
  2. split the data using split --lines=100 data.txt (this generate lots of files with 100 lines in them each named xaa xab xac and so on)
  3. Then prepend the header to each file for a in x??; do cat header.txt $a > $a.txt; done This results in your finished data files (with headers) being called xaa.txt xab.txt xac.txt ...

If the amount of data is so large (or you split on fewer lines) that xxx files is not enough split makes four letter named files. In that case insert an extra ? in the for-statement above.

Edit:
To automate the extraction of the header use head -4 origdata.txt > header.txt to extract the first four lines. Use tail -n +4 origdata.txt > data.txt to extract everything except the first four lines. Now you have two files one with the header and one with the data. It should not be too hard to combine this to a script. (I have no access to bash today)

Nifle

Posted 2010-09-14T09:21:21.070

Reputation: 31 337

thanks! I had to substitute "--lines=100" for "-l 100" but apart from that it works like a charm. However ideally I would prefer a script or a single line command to do the job so it is easier for (less computer savvy) coworkers to take over these tasks in my absence. – Dan – 2010-09-14T11:49:33.990

@Dan - You could put it all in a script with a bit of fiddling. See my edit to automate the first part. – Nifle – 2010-09-14T13:17:51.413

I managed to compile your suggestions into a script. I also defined a few variables to incorporate the original file name in the output. It probably contains a few scripting faux pas since I am fairly new to this but it does the trick quite nicely. Thanks again! – Dan – 2010-09-15T12:44:10.683

3

Based on the answer provided by Nifle I made a script that executes his suggested commands, adds the original filename to the output and cleans up the temporary files.

#!/bin/bash

FILE=$(ls -1 | grep filename.txt)
NAME=${FILE%%.txt}

head -4 $FILE > header.txt
tail -n +5 $FILE > data.txt

split -l 100 data.txt

for a in x??
    do
        cat header.txt $a > $NAME.$a.txt
    done

mv $FILE $NAME.orig.txt
rm header.txt data.txt x??

Et voila!

Dan

Posted 2010-09-14T09:21:21.070

Reputation: 98