How can I sort a document according to a substring in each line on Win7?

How can I sort a text according to hashtag on Windows-7?

I have a long text (.txt format) which looks something like this:

Blah blah #Test
123123 #Really
Blah bluh #Really
klfdmngl #Test

I would like to conveniently, quickly and automatically be able to sort the text so that it looks like this:

Blah blah #Test
klfdmngl #Test
123123 #Really
Blah bluh #Really

I have to do this on a daily basis so I would like to be able to do it in as few steps as possible.

Joey Hammer

Posted 2012-09-21T11:58:04.037

Reputation: 59

1Apart from just suggesting which software to use, please also define the exact procedure. – Joey Hammer – 2012-09-21T11:58:51.413

2Step 1: Replace # with ,# Step 2: Import as CSV into Excel or similar application. – Der Hochstapler – 2012-09-21T12:10:00.760

1Your comment elsewhere says "so the script or whatnot has to be clever enough to work with long lines with line breaks" that {EOL} is not a reliable delimiter? is the record delimiter a #[sometext]{eol}? – horatio – 2012-09-21T15:08:36.850

Please update your question with all additional information about the input files format. – martineau – 2012-09-21T17:05:19.200

@Oliver Salzburg: I think the OP would also want to know what to do after importing it into Excel or whatever? – martineau – 2012-09-21T17:14:43.200

@martineau: I'm sorry, it wasn't really meant to be a solution even though it did work for me. I wouldn't want to have to recommend to someone to use a beast like Excel to solve a task like this. You guys have come up with much better examples :) – Der Hochstapler – 2012-09-21T18:24:12.757

Answers

Here's a final powershell solution that will deal with new lines. The delimiter is assumed to be a hashtag followed by word characters followed by {EOL}. Given a line of data with no hash tag, it is assumed that the data continues on to the next line. The other information below this section of my answer does not deal with the special case mentioned by the author where data crosses a newline boundary. This example assumes the file is called test.txt and is found in the current directory.

[string[]]$fileContent = (get-content .\test.txt);
[string]$linebuffer = '';

[object]$fixedFile = foreach($line in $fileContent) {
    if(-not ($line -match "#\w+$")) {
        $linebuffer += ($line + ' ');
        continue;
    }

    $linebuffer += $line;
    $linebuffer;
    $linebuffer = '';
}

($fixedFile -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii

Use gVim in Windows or MacVim on OS X.

NOTE: Vim is an editor with 2 modes. Insert/Edit mode and Command mode. To actually edit text like a normal editor, you must be in edit mode which requires pressing a key like a or i. The editor will start in command mode. When in command mode, you can just start by typing a colon to enter these commands.

:%s/^\(.*\)\ \(\#\w\+\)$/\2\ \1/g
:sort
:%s/^\(\#\w\+\)\ \(.*\)$/\2\ \1/g

The first command swaps the hashtag at the end of the line to the beginning of the line. The second command sorts the data and the third command undoes the swap and moves the hashtag back to the end of the line.

I've tested this on your sample and it works.

@Oliver_Salzburg provided a much easier answer with Excel in comments. I didn't think outside the box and provided an answer with a text-editor.

Step 1: Replace # with ,# Step 2: Import as CSV into Excel or similar application. – Oliver Salzburg♦

Here's a solution using only Powershell that can be done natively on Win7. I still haven't had a chance to read up on traversing line breaks, so this solution does not account for those.

This example assumes that the file you're working with is test.txt.

$tempstor = (get-content test.txt) -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object
$tempstor -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ASCII

One liner, leverage sub-shells.

((get-content test.txt) -replace '^(.*)\ (#\w+)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii

Sean C.

Posted 2012-09-21T11:58:04.037

Reputation: 554

(OS: Win7). I was already aware of the Excel solution. However, as I do this quite often (up to ten times a day), it is inconvenient to keep converting the file to CSV, opening excel, clicking the sort buttons etc. Maybe there's a way to create a batch file... Ideally, I would just like to click one time, activate a script or something which takes care of everything and outputs into a new file. – Joey Hammer – 2012-09-21T12:16:10.113

One more thing: some of the lines are long and may look like this: blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah #hashtag - so the script or whatnot has to be clever enough to work with long lines with line breaks. – Joey Hammer – 2012-09-21T12:16:37.523

That's also possible with vim, but it requires an additional switch on the end of the command to traverse line breaks. I'll update the original answer when I come up with the solution, it eludes my memory. – Sean C. – 2012-09-21T12:22:59.450

Another point, if you're interested in learning Vim, you can create user-defined functions and macros. This would allow you to enter a command like :sorthash for example and it would execute the commands in the proper sequence. – Sean C. – 2012-09-21T12:24:19.183

I'm using Dreamweaver normally. I have a copy of Vim too but I don't use it that much (at all). If this problem can be solved in Vim that would be nice. Even nicer would be if I could auto execute a script so that I wouldn't actually have to open Vim and load the file every time. Like, just double clicking a desktop icon which executes a script or whatnot, sorts the document and saves it under a new name in a folder of choice. – Joey Hammer – 2012-09-21T12:50:07.217

If you install unxutils for windows if could be done with a script easily. sed and sort could do it. I'm at work at the moment and I've been called away. I'll update the answer with some scripted solutions when I've concluded my work. – Sean C. – 2012-09-21T13:01:01.667

Ok. I've got two folders on the desktop now, bin and usr. Thanks for helping. Please let me know when you've got the solution : ) – Joey Hammer – 2012-09-21T13:25:31.033

Eh, Rajesh, he hasn't provided the full solution yet. – Joey Hammer – 2012-09-21T14:09:16.067

I changed .\test.txt to c:\blah\blah\test.txt which seems to be ok but I get an error message for the last part of the script: PS C:\Users\Pongy> ($fixedFile -replace '^(.)\ (#.)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | outf ile test.txt -encoding ascii The term 'outfile' is not recognized as the name of a cmdlet, function, script file, or operable program. Check the spe lling of the name, or if a path was included, verify that the path is correct and try again. – Joey Hammer – 2012-09-22T07:48:30.227

At line:1 char:105

($fixedFile -replace '^(.)\ (#.)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | outfile <<<< test.t

xt -encoding ascii + CategoryInfo : ObjectNotFound: (outfile:String) [], CommandNotFoundException + FullyQualifiedErrorId : CommandNotFoundException – Joey Hammer – 2012-09-22T07:49:04.767

@JoeyHammer I made an error that I've corrected. outfile should have been out-file. – Sean C. – 2012-09-24T12:04:40.317

Here's a Windows batch (.bat) or command (.cmd) file that will do it. I wasn't sure what you wanted to do with the output, so this just displays one of the two temporary files it creates and then deletes both of them.

@echo off
if {%1} == {} (
echo usage: %0 ^<filename^>
goto :EOF
)
echo.>_temp1
for /F "tokens=1,2 delims=#" %%i in (%1) do echo %%j$%%i>>_temp1
echo.>_temp2
sort _temp1 >_temp2
echo.>_temp1
for /F "tokens=1,2 delims=$" %%i in (_temp2) do @echo %%j#%%i>>_temp1
type _temp1
del _temp1
del _temp2

martineau

Posted 2012-09-21T11:58:04.037

Reputation: 3 849

This is a nice solution but it doesn't handle line breaks as mentioned in the comments of the answer I provided (second comment). – Sean C. – 2012-09-21T15:59:01.787

@Sean C.: It wasn't obvious from what's in the comment that there was a line break in the line. It would be best if the OP updated their question and better described the input file's possible contents. I wonder if simply detecting that a line had no # would be sufficient to assume it was continued on the next one (or more). – martineau – 2012-09-21T17:01:42.463

The final powershell solution provided looks for the presence of a hash tag at the end of the line /#\w+$/ if it doesn't exist, it assumes the data is continued on the next line. – Sean C. – 2012-09-21T17:33:06.023

@SeanC.: Oliver_Salzburg's answer/comment about importing into Excel likely wouldn't handle broken lines either (without some custom VBA coding). However, like you, I'm kind of running out of steam with regards to this underspecified question. – martineau – 2012-09-21T18:15:45.690

If you're on Windows, you can use this simple PowerShell script:

[io.file]::ReadAllLines("test.txt")|Sort-Object {$_.SubString($_.IndexOf('#'))}

I'm hardly a PowerShell expert, so, sorry if there is a more optimal solution :)

Example

Here's the content of my input file test.txt:

PS C:\Users\Oliver> type test.txt
Blah blah #Test
123123 #Really
Oliver #SuperUser
Blah bluh #Really
klfdmngl #Test

This is the output when running the above script:

PS C:\Users\Oliver> [io.file]::ReadAllLines("test.txt")|Sort-Object {$_.SubString($_.IndexOf('#'))}
Blah bluh #Really
123123 #Really
Oliver #SuperUser
klfdmngl #Test
Blah blah #Test

Analysis

[io.file]       # From the module io.file...
::ReadAllLines  # use method ReadAllLines to read all text lines into an array...
("test.txt")    # from the file test.txt

|               # Take that array and pipe it to...
Sort-Object     # the cmdlet Sort-Object (to sort objects)
{               # To sort the elements in the array...
$_.SubString(   # use the part of the text line...
$_.IndexOf('#') # that starts at the first position of a #
)}

Der Hochstapler

Posted 2012-09-21T11:58:04.037

Reputation: 77 228

@Oliver_Salzburg This solution does not handle the case where data spans multiple line breaks. See comment #2 in the answer I provided. – Sean C. – 2012-09-21T19:42:07.160

@Oliver_Salzburg Also, could you explain why you chose to use the .NET io.file class? Why not just use get-content? Your one-liner could easily have been get-content test.txt | sort-object {$_.SubString($_.IndexOf('#'))} – Sean C. – 2012-09-21T19:52:30.580

@SeanC.: My answer refers to the problem posed in the question. – Der Hochstapler – 2012-10-10T11:57:40.710

@SeanC.: Because I'm more familiar with .Net classes than I am with cmdlets. – Der Hochstapler – 2012-10-10T11:58:17.760