UNIX Command to sort file based on word delimited

1

I have a file that has lines file.txt like this:

www.site.com/230207|Sophie Rundle title: Episodes|5irko3ke
www.site.com/228264|Camilla Luddington title: Balifornication|5423234
www.site.com/228592|Sarah Power title: Californication|23423423
www.site.com/229022|Ali Cobrin title: American Reunion|tgkmktgkmtg
www.site.com/190074|Eva Green title: The Dreamers|rfrrfrf

I want it to be sorted into fileSorted.txt alphabetically by the word that comes after "title", so the result would be:

www.site.com/229022|Ali Cobrin title: American Reunion|tgkmktgkmtg
www.site.com/228592|Sarah Power title: Balifornication|23423423
www.site.com/228264|Camilla Luddington title: Californication|5423234
www.site.com/230207|Sophie Rundle title: Episodes|5irko3ke
www.site.com/190074|Eva Green title: The Dreamers|rfrrfrf

I know that we have to use the sort command, so I tried:

sort --field-separator='title:'  --key=1  file.txt > fileSorted.txt

but I got this result:

sort: multi-character tab ‘title:’

I tried to search around the internet but I couldn't' find a solution. How can I sort the file the way I want to as explained above? The file has 100K lines so performance matters.

George Chalhoub

Posted 2015-12-20T17:06:29.480

Reputation: 131

Answers

0

I found a way to do it easily and efficiently using one line in bash:

sort --field-separator=':'  --key=3  file.txt > fileSorted.txt

George Chalhoub

Posted 2015-12-20T17:06:29.480

Reputation: 131

3

Maybe too simplistic (it won't work properly if there are fields where the author name has a ":" character in it), but you can simply sort on the ":" field with the command

sort -t: -k2 del.file

davidgo

Posted 2015-12-20T17:06:29.480

Reputation: 49 152

1

Use sed to temporarily change the string. This example makes it a controlA:

#!/bin/sh
SEP=$(echo x|tr x '\001')
sed -e "s/title:/$SEP/" file.txt | \
sort  -k2 -t "$SEP"  --key=1  |\
sed -e "s/$SEP/title:/" > fileSorted.txt

gives

www.site.com/229022|Ali Cobrin title: American Reunion|tgkmktgkmtg
www.site.com/228264|Camilla Luddington title: Balifornication|5423234
www.site.com/228592|Sarah Power title: Californication|23423423
www.site.com/230207|Sophie Rundle title: Episodes|5irko3ke
www.site.com/190074|Eva Green title: The Dreamers|rfrrfrf 

In your example, you were sorting from the beginning of the line. Based on comments, you intended sorting by the data beginning after the "title:" string, requiring the -k2 option. (I changed the separator option to POSIX, as well).

For reference, POSIX:

Thomas Dickey

Posted 2015-12-20T17:06:29.480

Reputation: 6 891

It didn't work. – George Chalhoub – 2015-12-20T17:59:44.143

Maybe this requires --key=2 rather then --key=1 ? – davidgo – 2015-12-21T02:20:39.830

I had overlooked the -k2 while working on the delimiter (answer is already updated). – Thomas Dickey – 2015-12-21T02:23:49.483

1

You didn't say what tools you wanted to use, and it's always good to have options, so here's a perl solution to go along with Thomas' sed/sort solution.

$ cat file.txt
www.site.com/230207|Sophie Rundle title: Episodes|5irko3ke
www.site.com/228264|Camilla Luddington title: Balifornication|5423234
www.site.com/228592|Sarah Power title: Californication|23423423
www.site.com/229022|Ali Cobrin title: American Reunion|tgkmktgkmtg
www.site.com/190074|Eva Green title: The Dreamers|rfrrfrf
$ cat sortfile.pl
#!/usr/bin/perl --

use strict;
use warnings;

my @lines;

while (<>)
{
    push @lines, "$1\x00$_" if /title: (.*)/;
}

foreach (sort @lines)
{
    s/.*\x00//;

    print $_;
}
$ ./sortfile.pl file.txt
www.site.com/229022|Ali Cobrin title: American Reunion|tgkmktgkmtg
www.site.com/228264|Camilla Luddington title: Balifornication|5423234
www.site.com/228592|Sarah Power title: Californication|23423423
www.site.com/230207|Sophie Rundle title: Episodes|5irko3ke
www.site.com/190074|Eva Green title: The Dreamers|rfrrfrf

The concept is to copy the text you want to sort on to the front, sort, and remove the copied text. The key parts are:

while (<>)
{
    push @lines, "$1\x00$_" if /title: (.*)/;
}

This loops over all the lines in any files named on the command line (or standard input if there aren't any) and reads each line into $_. The if on the end of the 3rd line both makes sure the line looks like one we want to process, and saves everything after title: in $1. The push then pushes a line onto @lines which contains the title (from $1), a separator that shouldn't occur in a title (ASCII nul), and the rest of the line. When this loop is done, all the lines are in @lines with the title copied to the front.

foreach (sort @lines)
{
    s/.*\x00//;

    print $_;
}

This loops over all the lines accumulated in @lines after sorting them. Because the title's been copied to the beginning of each line, the lines are sorted by title. The s/.*\x00//; strips off the title and ASCII nul separator, restoring the line to how it was originally. The print then prints the entire (restored) line.

blm

Posted 2015-12-20T17:06:29.480

Reputation: 620