Split text file by whitespace

0

I have a very large text file with multiple columns of data.

I.e. 1312.4123 asdkofADkofaO213 dakofasdjodas Fri Mar 2013 15:23:11 .. .. .. etc.

I wish to split this text file into multiple subsets, based off columns (separated by white space). Column1.txt would have 1312.4123, Column2.txt asdkofADkofa0213 etc, for all subsequent lines. If I had to summarize what I'm after it would be text to column in excel, however the file size prevents me from using such a program.

I'm using linux which gives command line options.

Peleus

Posted 2013-04-05T10:13:17.463

Reputation: 539

Answers

1

This should give you what you want:

columns=`head -1 datafile.txt | wc -w`
for i in `seq 1 $columns`
do
    awk '{print $'$i'}' < datafile.txt > Column$i.txt
done

This assumes that all the rows in the file have the same number of columns as the first row.

Flup

Posted 2013-04-05T10:13:17.463

Reputation: 3 151

1

AWK is an interpreter based programming language. They tend to support reading and writing files.

The magic here is:

  • NF: Number of Fields on every file (space is the default delimiter)
  • i: user variable
  • $i: built in field variable $1 ... $NF ($0 is whole line)
  • "column" i: default operator for strings is concatenate (no need for "a"+"b" or "a"."b")
  • > file : output redirection

E.G:

$ ll
total 140
drwxr-xr-x 2 jaroslav jaroslav  4096 Mar 16 07:11 answers
drwxr-xr-x 3 jaroslav jaroslav  4096 Dec  7 12:38 diff
-rw-r--r-- 1 jaroslav jaroslav   214 Dec  7 12:38 diff.tar.gz
-rw-r--r-- 1 jaroslav jaroslav   700 Apr  5 02:37 fonts.sh
-rw-r--r-- 1 jaroslav jaroslav     4 Apr  5 15:52 hai
-rw-r--r-- 1 jaroslav jaroslav     0 Mar 19 05:06 moo
-rw-r--r-- 1 jaroslav jaroslav 10240 Dec  7 12:08 moo.tar
-rw-r--r-- 1 jaroslav jaroslav 23147 Mar 16 08:29 ob.rc.xml
drwxr-xr-x 3 jaroslav jaroslav  4096 Mar 16 03:08 rename
drwxr-sr-x 2 jaroslav games     4096 Mar 19 05:07 setgid
drwxr-xr-x 2 jaroslav jaroslav 69632 Mar 11 00:42 times
-rw-r--r-- 1 jaroslav jaroslav    92 Mar 11 00:14 while
drwxr-xr-x 4 jaroslav jaroslav  4096 Mar 22 00:15 wkhtmltoimage
$ ls -l  | awk '{ 
    for (i=1; i<=NF; i++) {
        file="column" i; 
        print $i > file 
    }
  }'
$ cat column9 | column
answers         fonts.sh        moo.tar         setgid          wkhtmltoimage
diff            hai             ob.rc.xml       times
diff.tar.gz     moo             rename          while

$ cat column1 | column
total           -rw-r--r--      -rw-r--r--      drwxr-xr-x      -rw-r--r--
drwxr-xr-x      -rw-r--r--      -rw-r--r--      drwxr-sr-x      drwxr-xr-x
drwxr-xr-x      -rw-r--r--      -rw-r--r--      drwxr-xr-x

Ярослав Рахматуллин

Posted 2013-04-05T10:13:17.463

Reputation: 9 076