Sort multiple files with bash

1

I have a question that involves the bash scripting language.

I have multiple directories

  • /studentName
  • /studentMail
  • /studentNumber

In each of these directories is a file name.txt, mail.txt, number.txt.

Now I need to create a function that will do the same as the SELECT function of a MySQL database. It doesn't need to read a single line. Just display all contents of those 3 files and sort them. which means i need something like this as output.

studentname | studentmail | studentnumber

I came up with 2 ways.

first:

cat /studentName/name.txt /studentMail/mail.txt /studentNumber/number.txt > summary
cat summary

This will display all contents of the 3 files under each other, which is obviously not good.

I also came up with this:

paste /studentName/name.txt /studentMail/mail.txt /studentNumber/number.txt

This does display all contents but still not really sorted. And I also later on need to be able to only select 1 row to be displayed.

Can anybody help me do this?

PS: I know about sort, but then all contents get displayed under each other, somehow I am not doing it right?

bryan

Posted 2011-06-01T18:03:25.210

Reputation: 71

Answers

2

How are the files sorted now? Does line 3, say, of all three files refer to the same student? If so, you could expand your paste solution to this:

paste /studentName/name.txt /studentMail/mail.txt /studentNumber/number.txt | sort

which would sort all the records (lines) by student name. You could sort by some other field by using appropriate options to sort.

To select a single row to be displayed, follow whatever command yields a properly-sorted list with grep, e.g.,

paste ... | sort | grep 'pattern'

where 'pattern' would be your search criteria in the form of a regular expression. Of course, if you're selecting only one line, there is no need for sort.

Another command you might find useful is join, but I don't know enough about it to give you an example of its use.

Update: Formatting with awk

The output of the paste command above is a sequence of lines, each line consisting of three fields separated from each other by tabs, i.e.,

<field1><tab><field2><tab><field3>

These lines can be formatted by piping them into the following awk command.

awk -F '\t' '{printf "%-20s%-16s%s\n", $1, $2, $3}'

The -F '\t' argument specifies that the input field separator is a tab character. That will separate the input lines into three fields which awk refers to by $1, $2 and $3. The awk language includes a printf function that behaves essentially the same as the C library printf() function. The format string above specifies three string fields. The first, %-20s, specifies that the corresponding string parameter be left-justified in a 20-character field. The second, %-16s, specifies that its parameter be left-justified in a 16-character field. The last, %s, just appends its parameter to whatever has been formatted so far. Finally, the \n puts a newline at the end so that each input line is formatted to a separate output line.

To tune the output to your taste, just change the field widths and/or remove the minus signs to right-justify the strings. For more options, see the awk and printf man pages.

garyjohn

Posted 2011-06-01T18:03:25.210

Reputation: 29 085

When I use the paste option, i get this as output:

bryan 912391923 bryan@bryan hello 2030123 lalal@lallw
ollo 23123123 ollo@ollo

What i want is that they will be displayed in a table like option. Where the name gets displayed in a first column and the number in a second etc. – bryan – 2011-06-01T20:12:23.213

@bryan: It would help to know the contents and/or format of those files and the results you expect. Otherwise, I'm just taking stabs in the dark. To me, the paste option output above looks pretty good. It has the name, id and email of three students. Is each of those records a single line? (SU Comments don't seem to support newlines.) – garyjohn – 2011-06-01T20:17:50.790

Well, the content of the following files is pretty simpel,

*name.txt

has in it's file for expample:

bryan eeden hello

the file number.txt has in its file as example: 1234567 34567 34688

and mail.txt has for example:

bryan@bryan hello@hello test@test

The paste option indeed puts these things next to each other, but i need to increase the tabs in the output since sometimes there is no tab between the 3 colums – bryan – 2011-06-01T20:33:39.753

@bryan: So it sounds like paste basically works but that the output format isn't quite right. OK. One solution to that is to pipe the results into the expand command with a tab setting large enough to space the output as you want. For example, | expand -t20 will replace each tab by enough spaces to align the next word at the next column that's a multiple of 20 spaces. You could also format the output using awk. It gives good results but is more work. – garyjohn – 2011-06-01T20:57:18.590

Wauw cool thanks garyjohn. The expand indeed did the trick. I have been told to try to use awk indeed. But I have no idea on how to use it, i cannot quite understand the man page of awk, maybe you can help me on the good way on how to use awk ( if you know how it works ). – bryan – 2011-06-01T21:04:31.550

@bryan: I updated the answer to include a brief awk example. – garyjohn – 2011-06-01T22:15:19.913

1

if you want to sort each file then paste the sorted data, with bash you can use process substitution:

paste -d '|' <(sort file1) <(sort file2) <(sort file3)

glenn jackman

Posted 2011-06-01T18:03:25.210

Reputation: 18 546