Sort each standalone line alphabetically

1

I want to sort some items in alphabetic order, but in a very specifc way.

I have, for example, the following list, each item separated by comma:

monkeys, big dogs, cats
pineapple, banana, orange
yellow, red, blue, green
silver, gold, platinum
delphi, java, c++, visual basic

An item here is defined by the piece of text: 1. starting in the beginning of a line and ended right before the first comma; 2. surrounded by commas and 3. starting right after the last comma and ending ate the end of line. So spaces are not separators, as in "big dogs" forms a single item.

I want to sort each line alphabetically, WITHOUT changing line order.

My desired result would be:

big dogs, cats, monkeys
banana, orange, pineapple
blue, green, red, yellow
gold, platinum, silver
c++, delphi, java, visual basic

My target list has got 3000+ lines, so it should be an automated process.

Thanks!

Daniel

Posted 2012-10-19T14:52:43.803

Reputation: 13

What operating system are you using? – Daniel Beck – 2012-10-19T17:06:20.903

I'm using Windows 7, but I have cygwin installed also. – Daniel – 2012-10-19T17:06:59.013

Anybody can help me? – Daniel – 2012-10-23T21:46:41.553

Do you know any programming languages? Perl or Python could do this pretty easily. Perl would be my first choice, this being a text processing task. Any programming language could do it, though. Shouldn't take more than half an hour at very most. – Jack M – 2012-10-23T21:49:48.267

I code in java and delphi, but I feel python or perl would give me these results easily, wouldn't they? – Daniel – 2012-10-23T23:25:10.347

Answers

1

Powershell one liner:

$sep=","; gc infile.txt |% {$line=($_ -split $sep)|% {$_.trim()}|sort;$line -join $sep} >outfile.txt

Notes:
1. Uses PS 2 join syntax, which is more compact.
2. Using , as separator (as shown) will remove all leading/trailing spaces from words. That's is what I assume from context you want, but if I took your description literally, they should be retained. If you do wish so, remove |% {$_.trim()} (but then sort will not work 'as expected' for your example with leading spaces)
3. You may use ,<space> (or anything else, for that matter) as output separator (-join ", ") this will normalize any mixed input (with or without spaces after comma) to the one you selected.
4. Default encoding for output in PS is Unicode (UTF-16). You may change it by using | out-file -Encoding <encoding_type> instead of redirection > if you need to control that. To see available encodings, run help out-file -full

wmz

Posted 2012-10-19T14:52:43.803

Reputation: 6 132

Thanks for the answer. Yeah, your right, I want to trim the input line, but not the output. Thus, the real separator is ",<space>". I changed to (-join ", ") and it worked. However I got another problem: I have some words containing characters such as "á", "é" and "ó" in the beginning. With this powershell command, such items are sorted before "a" alphabetically. It should be like this "a,á,b,c,d,e,é,f,g" ... etc. – Daniel – 2012-10-24T14:21:00.660

@Daniel I do not know under what regional settings you execute it, it worked for me for every single I tried. Anyway, you could use sort -culture to alter collation (eg. sort -culture en-us for English US). It may be also problem with input file encoding, but you would probably see 'garbage' as a result. – wmz – 2012-10-24T16:09:19.510

That's perfect! I set -culture -en-us and it works flawlessly! Thank you very much! – Daniel – 2012-10-24T17:19:59.410

@Daniel Glad it worked! I added a point about encoding just in case you need to set it to something different than default. – wmz – 2012-10-24T18:24:17.777

1

Here's one that ought to do it in python.

import csv

f = open("sortrows.csv", 'rb')
reader = csv.reader(f)

outf = open("sortedrows.csv", 'w')
for row in reader:
    row.sort()
    outf.write(",".join(row) + "\n")

f.close()
outf.close()

Ryan

Posted 2012-10-19T14:52:43.803

Reputation: 3 179

Won't the print(row) slow it down quite a bit? – Jack M – 2012-10-24T11:17:24.627

Thanks for the answer. This gives me a 0-byte-length "sortedrows.csv". – Daniel – 2012-10-24T14:14:00.497

What should I do? – Daniel – 2012-10-24T15:37:52.967

@JackM That's true. I left it in there from my debugging. – Ryan – 2012-10-24T19:08:30.640

It still gives a 0-byte sortedrows.csv when I run this script. I have python 3.3.0 installed. What's the problem? – Daniel – 2012-10-24T20:41:20.773

@Daniel This one was written using Python 2.7. I know several things have changed syntax in Python 3.x, but I don't know exactly what's breaking it. You can look up csv reader and file writing in Python 3 to figure out what's different. – Ryan – 2012-10-24T23:09:11.947