Breaking a file down of strings, into separate files each based on the first letter. BASH

4

1

Alright, so I have a file full of thousands of strings. Each one on it's own line. I want to make a script that will allow me to take this file, call it list.txt, and take the items from each line, and place it into separate files based on the first letter or number. As an example, say the first few lines of the file are like this:

cheese
pizza
pepperoni
lettuce
grahamCrackers
0-0Foods
chicken
lentils
1-2Items

I need to break it down into these:

c.txt

cheese
chicken

g.txt

grahamCrackers

l.txt

lettuce
lentils

p.txt

pizza
pepperoni

0.txt

0-0Foods

1.txt

1-2Items

I would like to accomplish this with BASH, on OS X. Thanks.

Oh, if it helps. Items on each line will NEVER have a space, they will always be contained as one word. E.G. (Never Chicken Soup, instead Chicken-Soup)

Josiah

Posted 2013-02-02T17:57:18.657

Reputation: 1 674

Answers

4

Try this

OLDIFS=$IFS
IFS='
'
typeset -a file
file=($(cat list.txt))
for i in "${file[@]}"; do
    echo $i >> ${i:0:1}.txt
done
IFS=$OLDIFS

Note, the IFS part is not usually necessary. Also, I tested it on Zsh 4.3.17 on linux and on Bash 4.2.37.

What it does is it declares an array, assigns the contents of the file to that array, then loops over each element of the array, hence each line and echo's that element into the file with the name of the first lettes plus '.txt' appended to it.

KoviRobi

Posted 2013-02-02T17:57:18.657

Reputation: 76

I just tried using that exact code and just replaced list.txt with my file. However, it took awhile, but after it was done. Nothing happened. Was I suppose to do something else? – Josiah – 2013-02-02T18:19:39.017

No, it works for me out of the box. Try 'set -x' and then run the command, and pastebin the output, perhaps I can help then. – KoviRobi – 2013-02-02T18:23:05.877

One thing though, is there a way you can put these outputed files in a folder first. – Josiah – 2013-02-02T18:25:34.480

1Yes, 'mkdir' that folder first, then change '${i:0:1}.txt' to 'folder/${i:0:1}.txt' – KoviRobi – 2013-02-02T18:26:05.473

Works great, thanks a lot. But I have one more for you... (I can't make this too easy. ;) ) Say I want to divide based on the first two letters, can that be easily accomplished? – Josiah – 2013-02-02T18:34:35.710

Never mind, I got it. Thanks a lot for your help. I really appreciate it. – Josiah – 2013-02-02T18:48:04.043

3Whoa... this is much more elegant than the grep/sed-based solution I was going to write up. It's always surprising what bash can do by itself. With that said, you can avoid messing around with $IFS & generally simplify things by using a while loop instead of that for loop. Replace for i in "${file[@]}"; do with while read i; do, and replace done with done <list.txt, and then you can ditch all the stuff outside of the loop. – evilsoup – 2013-02-02T18:56:38.463

@evilsoup you can't beat gawk for simplicity when it comes to this sort of thing. See my answer for an example. – terdon – 2013-02-02T21:12:02.890

5

You could just use gawk and simplify things:

gawk '{n=substr($1,0,1); print >> n".txt"}' file.txt
  • n=substr($1,0,1) takes a substring of length 1 starting from the first position (0) of the first field ($1) and saves it into a variable called n.

  • print >> n".txt" will append (>>) each line into a text file called n.txt (where n is the first letter).

To do the same thing for the first two letters, just change the length of substr:

gawk '{n=substr($1,0,2); print >> n".txt"}' file.txt

terdon

Posted 2013-02-02T17:57:18.657

Reputation: 45 216

Do not forget to close() if your awk stumbled with "too many open files" error, e.g. gawk '{n=substr($1,0,2); print >> n".txt"; close(n".txt")}' file.txt – aff – 2014-12-10T08:21:47.367

Cool, this is a good answer too. +1! – Josiah – 2013-02-03T04:13:02.693

0

#!/bin/bash

while read line
do
    firstChar=${line:0:1}
    fileName=${firstChar}.txt
    if [ -e ${fileName} ];then
    touch ${fileName}
     fi
    echo ${line} >> ${fileName}
done < list.txt

The above script takes the first character of each line read in from the list.txt file. It then attempts to create a file with that character + ".txt", and then append each line from list.txt to the appropriate character + ".txt" file.

BlackMamba

Posted 2013-02-02T17:57:18.657

Reputation: 101

You don't need to explicitly create the file; the >> will create it if it does not exist (unless noclobber has been set, in which case unset it for the script) – alexis – 2013-02-03T15:05:06.233

Not only do you not need to create it, you are also not creating it at all. -e means check if the file exists and you will only touch the file if that is true. All you are doing is modifying the file's creation date if the file exists. – terdon – 2013-02-03T21:31:25.960