Set collation order in shell sort to not ignore special characters

1

I am looking for a solution for sorting in postgres but it seems to be more of a problem of collation and therefore I ask the question for the standard Unix sort command instead.

I have the following data:

A_A1
A\A2
A_A2
A\A1

after sort I get:

cat test.txt |sort

A_A1
A\A1
A_A2
A\A2

but I want:

A_A1
A_A2
A\A1
A\A2

I also tried LANG=C cat... but to no avail. So which collation rule would allow me to not ignore the special characters?

Fabian

Posted 2012-04-19T16:33:32.520

Reputation: 113

Answers

2

From man sort:

Set LC_ALL=C to get the traditional sort order that uses native byte values.

So:

$ LC_ALL=C sort test.txt
A\A1
A\A2
A_A1
A_A2

so the C locale does sort after byte value.


You need to do

$ cat test.txt | LC_ALL=C sort

if you want to pipe it like that (but always try to use the file name version directly if it's available).


The primary environment variable affecting this is LC_COLLATE. If LC_ALL is set though, it trumps all specific LC_ values. If neither LC_ALL nor LC_COLLATE are set, it falls back on LANG. If that is not set, it defaults to locale C.

Daniel Andersson

Posted 2012-04-19T16:33:32.520

Reputation: 20 465

Arrgh stupid me.

LANG=C works but not as

`LANG=C cat test.txt|sort`

but only

`cat test.txt|LANG=C sort`

of course! – Fabian – 2012-04-19T16:42:35.783

@Fabian: LANG=C does not work for me, actually, but that depends on that I have LC_COLLATE (which is the primary environment variable that affects collation). I'll add info on this in my answer. – Daniel Andersson – 2012-04-19T16:46:04.907