Here's a final powershell solution that will deal with new lines. The delimiter is assumed to be a hashtag followed by word characters followed by {EOL}. Given a line of data with no hash tag, it is assumed that the data continues on to the next line. The other information below this section of my answer does not deal with the special case mentioned by the author where data crosses a newline boundary. This example assumes the file is called test.txt and is found in the current directory.
[string[]]$fileContent = (get-content .\test.txt);
[string]$linebuffer = '';
[object]$fixedFile = foreach($line in $fileContent) {
if(-not ($line -match "#\w+$")) {
$linebuffer += ($line + ' ');
continue;
}
$linebuffer += $line;
$linebuffer;
$linebuffer = '';
}
($fixedFile -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii
Use gVim in Windows or MacVim on OS X.
NOTE: Vim is an editor with 2 modes. Insert/Edit mode and Command mode. To actually edit text like a normal editor, you must be in edit mode which requires pressing a key like a
or i
. The editor will start in command mode. When in command mode, you can just start by typing a colon to enter these commands.
:%s/^\(.*\)\ \(\#\w\+\)$/\2\ \1/g
:sort
:%s/^\(\#\w\+\)\ \(.*\)$/\2\ \1/g
The first command swaps the hashtag at the end of the line to the beginning of the line. The second command sorts the data and the third command undoes the swap and moves the hashtag back to the end of the line.
I've tested this on your sample and it works.
@Oliver_Salzburg provided a much easier answer with Excel in comments. I didn't think outside the box and provided an answer with a text-editor.
Step 1: Replace #
with ,#
Step 2: Import as CSV into Excel or similar application. – Oliver Salzburg♦
Here's a solution using only Powershell that can be done natively on Win7. I still haven't had a chance to read up on traversing line breaks, so this solution does not account for those.
This example assumes that the file you're working with is test.txt
.
$tempstor = (get-content test.txt) -replace '^(.*)\ (#.*)$', '$2 $1' | Sort-Object
$tempstor -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ASCII
One liner, leverage sub-shells.
((get-content test.txt) -replace '^(.*)\ (#\w+)$', '$2 $1' | Sort-Object) -replace '^(#\w+)\ (.*)$','$2 $1' | out-file test.txt -encoding ascii
1Apart from just suggesting which software to use, please also define the exact procedure. – Joey Hammer – 2012-09-21T11:58:51.413
2Step 1: Replace
#
with,#
Step 2: Import as CSV into Excel or similar application. – Der Hochstapler – 2012-09-21T12:10:00.7601Your comment elsewhere says "so the script or whatnot has to be clever enough to work with long lines with line breaks" that {EOL} is not a reliable delimiter? is the record delimiter a
#[sometext]{eol}
? – horatio – 2012-09-21T15:08:36.850Please update your question with all additional information about the input files format. – martineau – 2012-09-21T17:05:19.200
@Oliver Salzburg: I think the OP would also want to know what to do after importing it into Excel or whatever? – martineau – 2012-09-21T17:14:43.200
@martineau: I'm sorry, it wasn't really meant to be a solution even though it did work for me. I wouldn't want to have to recommend to someone to use a beast like Excel to solve a task like this. You guys have come up with much better examples :) – Der Hochstapler – 2012-09-21T18:24:12.757