How can I sort the lines in a text file, by the length of each line, in Notepad++?

13

5

How Can I sort a text file by line length in notepad++? Is there any plugin available for the mentioned task?
In case that there is no plugin, What is the first and maybe second tutorial to read, In order to write the plugin Myself?

hpaknia

Posted 2013-06-18T11:35:51.680

Reputation: 594

If you want to do this kind of sort from the command line instead of in your editor and you have any Unix tools like AWK or perl available, check this question and it's answers for ways to to this.

– Caleb – 2016-11-24T12:30:32.073

1You know, sometimes it's best to just write some code and get it over with. – Daniel R Hicks – 2013-09-16T15:14:17.460

Are you dealing with small or large files? – ComFreek – 2013-09-19T19:58:23.463

50 MB file with long lines, about 250 KB length. – hpaknia – 2013-09-20T11:35:09.217

Is the data sensitive? Or could you share it on Dropbox/Google-Drive/etc.? If Notepad++ can open and handle that file, I would imagine that my solution would work, but I'd love to try it out myself. – Dane – 2013-09-20T15:18:29.600

Hey @HPM, any chance of getting to work on your data? – Dane – 2013-09-24T20:17:42.203

Sure, I'll soon share it. – hpaknia – 2013-10-02T21:14:57.413

Answers

6

This answer is inspired by a YouTube video. Updated to maintain original sort order, if that is important.

Notepad++ has a built-in TextFX tool that sorts selected lines alphabetically. This tool can be hijacked to sort by the length of the lines by placing spaces on the left of each line, and making sure that all the lines are the same length.

"The Zoo" comes alphabetically before "Their House" because the space is treated as a character and comes before "i". __X (pretending the underscores are really spaces) will similarly come alphabetically before _XX. The idea in this answer is to add spaces and line numbers so that __________092dog will be sorted above _003alligator.

I'll use the following as example data:

Lorem
ipsum
dolor
sit
amet
consectetur
adipisicing

Step 1. Add line numbers.

(Note added by barlop- a note for the reader regarding this step, we will not be sorting according to these line numbers, we're sorting according to the length of the lines. But the reason for adding the line numbers, is so we know the natural order, so that when for example, two+ lines are of equal length we can sort those lines according to that natural order)

Assuming your text file only has the data in it, place the text cursor (the vertical line) into the very first position of the file. Then in the Edit menu select Column Editor... (Alt+C). Choose "Number to Insert" and start with 1, increase by 1, and include leading zeros. Note that this will retain the original ordering when sorting from shortest string to longest string. Reverse all lines first if you want to sort longest to shortest.

1Lorem
2ipsum
3dolor
4sit
5amet
6consectetur
7adipisicing

Step 2. Pad all lines with leading spaces.

Place the text cursor (the vertical line) into the very first position of the file. Then in the Edit menu select Column Editor... (Alt+C). Insert enough spaces so that the shortest line of data will be padded out to the length of the longest line of data. If your shortest line has 4 characters, and your longest 44, then make sure you insert at least 40 spaces.

__________1Lorem
__________2ipsum
__________3dolor
__________4sit
__________5amet
__________6consectetur
__________7adipisicing

Step 3. Trim lines to a uniform length.

Use the following Regular Expression Find/Replace (Ctrl+H) to match the right-hand characters equalling or exceeding the length of your longest data line.

^.*(.{50})$

Replace all with $1. That will trim everything except the right-most 50 characters of every line. If your data is longer (or short) than 50, adjust the {50} in the Regular Expression.

(Note added by barlop- the idea here is the shortest lines have the most spaces at the beginning)

_______1Lorem
_______2ipsum
_______3dolor
_________4sit
________5amet
_6consectetur
_7adipisicing

Step 4. Sort the lines.

Select all of the text (Ctrl+A). Via the TextFX menu, go to Text FX > TextFX Tools > Sort lines case sensitive (at column). Your data should now be in length order, from shortest to longest. If you want them in order from longest to shortest, uncheck the Text FX > TextFX Tools > + Sort ascending option before sorting. Note how line numbers are reversed as well.

_________4sit
________5amet
_______1Lorem
_______2ipsum
_______3dolor
_6consectetur
_7adipisicing

Step 5. Remove leading spaces.

Use another Regular Expression Find/Replace (Ctrl+H) to match the leading spaces.

^ *\d{4}

That's a space between the caret and asterisk. Replace all with nothing. That will remove all leading spaces and the inserted line numbers, if you had 4-digit line numbers. Replace the {4} with the correct number of digits in your line numbers.

sit
amet
Lorem
ipsum
dolor
consectetur
adipisicing

MACRO

I recorded the above steps using Notepad++'s macro feature, and it doesn't work. I'm not sure which step fails, but I haven't diagnosed why. You could probably use AutoHotKey to automate this if you do it repeatedly.

Dane

Posted 2013-06-18T11:35:51.680

Reputation: 1 682

https://stackoverflow.com/a/47040416/125507 has a simple way to right-justify – endolith – 2019-05-17T20:34:54.820

@endolith - how is that way simpler? Because it uses a regex instead of the Column Editor? Seems to be the same set of steps otherwise. – Dane – 2019-05-23T13:30:15.433

2Warning: this is not a stable sort. In other words, lines of the same length will not necessarily appear in the same order after sorting - instead, they will be sorted lexicographically. – Bob – 2013-09-16T14:37:26.990

@Bob is correct, if you have lines of a given length, such as 33 characters, that have a particular order to them, that will not be reflected in the results. We can add the line numbers with Alt+C prior to step 1 (including leading 0s to ensure that lengths remain equal). Then, when cleaning up in step 4, use ^ *\d{5} or whatever number of digits were used for the line numbers. – Dane – 2013-09-16T15:12:10.047

2The answer has been updated to retain the existing sort order, assuming that is important. – Dane – 2013-09-16T15:25:12.563

nice one dane for following what that guy in the youtube video was doing, where he also disabled comments. Can you include a link to text where you think it fails, on pastebin http://pastebin.com/ ? and did it fail only with the macro, or manually also?

– barlop – 2013-09-18T14:33:37.037

@barlop I'll see if I can do a little diagnosis on the N++ macro. The result from my simple attempt was terribly corrupted results with many unusual and non-printing characters added. Also, while I was inspired by the YouTube video, my steps are a bit different. – Dane – 2013-09-18T14:44:53.917

1I must say, reading your answer, I only understood it when I tried it. I think a reason why you haven't got more votes, might be that people haven't understood the logic. Would you permit me to add an explanation of the logic, to your answer, at the beginning? – barlop – 2013-09-19T11:07:46.913

Sure. I'll add one too now, but feel free to improve it. – Dane – 2013-09-19T18:57:24.223

3

No I don't think there is. The closest is TextFx plugin but that's an character based sort not line length based. Your best bet is to throw the text into a spreadsheet and sort it there (using a separate computed column using the LEN() function).

snowdude

Posted 2013-06-18T11:35:51.680

Reputation: 2 560

Thanks, the text file has long lines and huge total size, so I put spreadsheet editors away. Let Me Update the Question. – hpaknia – 2013-06-18T16:58:50.063

@HPM well if you're willing to look outside notepad++ then command line would do it. like use some commands to get the length of the line each the end of each line. then you'd at least be nearer to doing it. – barlop – 2013-09-16T15:34:57.643

thanks, it's a good advice. What I am curious about is NP++ many plugins, why this one doesn't exist? – hpaknia – 2013-09-17T09:52:28.703

1

You can use SQL in N++ in CSV files ! For example if you have :

col1;
hgfhfghfhg;
khjfhgfhfghfgh;
kjhfhgfhfhgfghfhf;
lkjgjghjhg;
lkjgjg;

, you can execute command select * from data order by length(col1) desc to sort descending. "data" means current file. "col1" - name of first (and last) column.

Unfortunately there is probably bug that doesn't allow abandon delimiter after lines in one-column text.

Greck

Posted 2013-06-18T11:35:51.680

Reputation: 241

This is actually a great solution, if only SQL in N++ didn't mangle the data output. I just tested out your solution, and I added delimiters to the end of all lines with a quick regex replace, but the data output converts everything to lowercase, and replaced my dashes with question marks. – Dane – 2013-09-16T13:56:05.413

@Dane (I don't currently have access to Notepad++.) Perhaps try adding a single quote to the beginning and end of every line (and then the semicolon after that)? Maybe double quotes? – Bob – 2013-09-16T14:39:34.030

@Bob: no good. The lowercase thing is even mentioned in the release notes for the SQL in N++ plug-in. – Dane – 2013-09-16T15:10:35.323

0

Or if you happen to have linux and nedit:

ctrl-a
alt-r
perl -e 'print sort { length($a) <=> length($b) } <>'

user254657

Posted 2013-06-18T11:35:51.680

Reputation: 11

Not only is this not what the question asked, it's not even applicable to the same OS platform. – Caleb – 2016-11-24T12:21:52.410

it's still a helpful answer. it was the only one that worked well for me. he did specify that you need linux and nedit, so there's no problem. – Anthony – 2017-07-14T13:30:38.167