Sorting plaintext by line-ending content

0

I have a plaintext list organized in the following format:

[file pathname] [track name] [artist name] [numerical value]

e.g.:

C:\Users\Somi\Music\Shaheedi.mp3    Shaheedi [By.NagRa] [Rp] Pasla Jatha Birmingham 140.01
C:\Users\Somi\Music\08 - Idgaf.mp3  Idgaf   Bohemia (www.nannu.info)    88.01   
C:\Users\Somi\Music\08 - Klasikhz - Hangower.mp3    Hangower    Klasikhz    101.06  
C:\Users\Somi\Music\4. Manni Sandhu  -  Pegg 2012[CrazyMasti.Com].mp3   4.  Manni Sandhu  -  Pegg 2012[CrazyMasti.Com]  Manni Sandhu[CrazyMasti.Com]    103.01  
C:\Users\Somi\Music\5. Manni Sandhu  -  Jaan Sadi[CrazyMasti.Com].mp3   5. Manni Sandhu  -  Jaan Sadi[CrazyMasti.Com]   Manni Sandhu[CrazyMasti.Com]    94.01   
C:\Users\Somi\Music\6. Manni Sandhu  -  Gidhian Di Rani[CrazyMasti.Com].mp3 6. Manni Sandhu  -  Gidhian Di Rani[CrazyMasti.Com] Manni Sandhu[CrazyMasti.Com]    95.00   
C:\Users\Somi\Music\7. Manni Sandhu  -  Door Ni Kulne[CrazyMasti.Com].mp3   7. Manni Sandhu  -  Door Ni Kulne[CrazyMasti.Com]   Manni Sandhu[CrazyMasti.Com]    94.00   
C:\Users\Somi\Music\8. Manni Sandhu  -  Bottle[CrazyMasti.Com].mp3  8. Manni Sandhu  -  Bottle[CrazyMasti.Com]  Manni Sandhu[CrazyMasti.Com]    123.99  

Each entry in the list is separated from the succeeding one by a hard return. It also seems as though the terminal numerical value is separated from the rest of the line by a tab. Is there some way to sort these entries by that terminal numerical value? That is, is there some way I can get them all arranged such that the terminal numerical values are either ascending or descending?

I am running OS X Lion, but if needed, I can just transfer the text file to a Windows machine.

Any help is appreciated.

voxanimus

Posted 2013-05-17T06:20:09.347

Reputation: 103

Give us more data and info. We need to see more records. It looks like all the lines are variable length, right? Are there separators in the line or is it just varying number of spaces? – Jan Doggen – 2013-05-17T06:50:33.900

Okay, I've added some more sample inputs. – voxanimus – 2013-05-18T07:59:15.710

if it helps you, i've uploaded the source file for download here: http://www.mediafire.com/view/?555jioewcto5w4y

– voxanimus – 2013-05-18T09:00:13.487

Answers

3

If your input fields are not separated unambiguously, e.g. because titles contain spaces as well, you cannot assume a specific column index for your numerical value. Therefore, you need a tool that is able to extract the last column, regardless of its index. awk can do that:

awk '{ printf $NF; $NF=""; print "", $0 }' input.txt | sort -rn

Explanation:

  • printf $NF prints the last field. printf is needed because a regular print prints a newline as well.
  • $NF is cleared, so we basically remove the last column.
  • print "", $0 prints the whole line prefixed with one single space.
  • It is then sorted numerically (-n) and reversed (-r).

This solution works with both the BSD commands native to OS X as well as the GNU tools that come with Linux.


In your particular case, the file isn't well formatted. You have a Tab character before each newline, so this gets incorrectly parsed as the last field. For your file, the awk command would look as follows. Here, NF-1 refers to the second to last field.

awk '{ printf $(NF-1); $(NF-1)=""; print "", $0 }' test_sort_list.txt | sort -rn

Example:

% head -n 5 test_sort_list.txt
C:\Users\Somi\Music\(DJNagRa) Nachna Pawu - TeamPBN By NagRa.mp3    Nachna Pawu(DJNagRa)    TeamPBN(DJNagRa)    96.00
C:\Users\Somi\Music\(DJNagRa) Ni Goriyeh ft Billa Bakshi DJ Rags By NagRa.mp3   Ni Goriyeh (feat. Billa Bakshi)(DJNagRa)    DJ Rags(DJNagRa)    132.28
C:\Users\Somi\Music\(UMP) 08 Jassi J & Bhinda Jatt - Khushiya.mp3   Khushiya    (UMP) Jassi J & Bhinda Jatt 100.00
C:\Users\Somi\Music\(UMP) 09 Jassi J, Manjit Pappu & Cheshire Cat - Sadde Ton Piyara (Remix).mp3    Sadde Ton Piyara (Remix)    (UMP) Jassi J, Manjit Pappu & Cheshire Cat  85.99
C:\Users\Somi\Music\-  Baagi Ja Badshah - Bol Dehliye [www.Bhangracrew.com].mp3 Baagi Ja Badshah - Bol Dehliye  [BC] Santnam Singh Arshi Jatha  40.00

% awk '{ printf $(NF-1); $(NF-1)=""; print "", $0 }' test_sort_list.txt | sort -rn | head -n 5
250.00 C:\Users\Somi\Music\bilzkashif-bb06(www.songs.pk).mp3 Dil Nahin Lagda The Bilz and Kashif
250.00 C:\Users\Somi\Music\[WwD] Panjabi MC - Bari Barsi (12 Months) [iTunes-Rip].mp3 Bari Barsi (12 Months) Panjabi MC [www.worldwidedesis.com]
164.28 C:\Users\Somi\Music\Darh Tere Teh-VipJaTT.CoM.mp3 Darh Tere Teh-VipJaTT.CoM
164.07 C:\Users\Somi\Music\Jado Kade Tohar Shohar-VipJaTT.CoM.mp3 Jado Kade Tohar Shohar [VipJaTT.CoM] Gippy Grewal [VipJaTT.CoM]
164.04 C:\Users\Somi\Music\Dil Nachda.mp3 Dil Nachda Diljit VipJaTT.CoM

slhck

Posted 2013-05-17T06:20:09.347

Reputation: 182 472

thank you for your answer! i tried running your command in Terminal, replacing "input.txt" with the pathname of the text file, but i got the following error: "awk: can't open file Users/(my name)/Downloads/test_sort_list.txt source line number 1" do you know what might be going on? i edited the original question to include more inputs. – voxanimus – 2013-05-18T08:06:43.153

I assumed space as separators. Will try to fix that later when I'm back on my computer, but maybe you can supply your input file for download somewhere? I don't know if the editor here strips out anything. – slhck – 2013-05-18T08:39:04.207

sure thing. here you are: http://www.mediafire.com/view/?555jioewcto5w4y

– voxanimus – 2013-05-18T08:57:14.503

Thanks for supplying the file. It has a Tab character before every newline, which wasn't visible in the original post. I updated my answer with a command for your file. – slhck – 2013-05-18T10:24:44.647

2

Convert the file to UTF-8 and LF first:

$ file test_sort_list.txt
test_sort_list.txt: ISO-8859 English text, with very long lines, with CRLF line terminators
$ iconv -f iso-8859-1 -t utf-8 test_sort_list.txt | tr -d '\r' > test_sort_list2.txt
$ file test_sort_list2.txt
test_sort_list2.txt: UTF-8 Unicode English text, with very long lines

Then use sort -nk4:

sort -t$'\t' -rnk4 test_sort_list2.txt

Lri

Posted 2013-05-17T06:20:09.347

Reputation: 34 501

Does sort not handle the CRLF or ISO-8859 encoding? cut -d$'\t' -f4 gives the correct column. – slhck – 2013-05-18T10:37:12.543

CRLF line endings wouldn't matter in this case, but sort gave an illegal character sequence error for the original ISO-8859-1 file when LC_CTYPE was set to en_US.UTF-8. – Lri – 2013-05-18T10:43:25.177

Ah, I see. With LC_CTYPE set to C it works without conversion. – slhck – 2013-05-18T10:49:44.000

1

With just one line of input it's hard to tell exactly what you need. If your whole file is in the same format, then:

sort -k 10 input

Would be enough for your sample input. If you have variable length rows, you'll have to add an extra step to find the last column. This answer on SO is then probably something you're looking for.

SBI

Posted 2013-05-17T06:20:09.347

Reputation: 771

the linked question's author's file has comma delimited fields. my fields are tab-delimited. thanks, though! – voxanimus – 2013-05-18T08:11:00.863