Split a large text file


I have a large text file grouped with separate headers that I need to split into separate files.

For instance the file has headers like this:

--Heading 1--
some text

--Heading 2--
more text etc

--Heading 3--
asdf text

I need to split the large file into text files based on their headers.

So for the example, there would be a 3 file output.

Heading 1.txt:

--Heading 1--
some text

Heading 2.txt:

--Heading 2--
more text etc

Heading 3.txt:

--Heading 3--
asdf text

Does anyone know of a windows or max app/script that can do this?

Or maybe give instructions on how to write something like this in a programming language. I don't know python or java but maybe this is the time to learn. :)



Posted 2010-03-04T05:49:34.200

Reputation: 237



This is not the simplest answer, hopefully someone will come up with something neater. I put together a little script which will do this that should work on the Mac.

NUMFILES=`grep '^--.*--' $1 | wc -l`
csplit -k $1 '%^--.*--$%' '/^--.*--$/' "{$NUMFILES}" 
for file in `ls xx*`
        mv $file "`head -n1 $file | sed -e 's/--\(.*\)--/\1.txt/'`"

This works using csplit to chop up the file. The fourth line basically says ignore everything before the first header line and then split up the headers after that. lines 2-3 work out how many times csplit has to split up the file.

csplit names its output files xx followed by a 2 digit number. The last 4 lines rename all these files to whatever is in the header line with the -- removed.

Martin Hilton

Posted 2010-03-04T05:49:34.200

Reputation: 1 386


Here's a "one liner" 8-]. It's similar to what Martin has down. This will work on your Mac. Just open up the "Terminal" app and navigate to the directory containing myfile.txt

split -p '--.*--' myfile.txt FILE && for file in FILE*; do mv $file "$(head -1 $file | sed 's/--//g')".txt; done

PS. Make sure there are no files in the directory that are named FILE*. ie, make sure ls FILE* shows nothing.


Posted 2010-03-04T05:49:34.200

Reputation: 274