0
Regarding is this a duplicate: There are similarly worded questions such as https://unix.stackexchange.com/questions/76061/can-sed-remove-double-newline-characters or https://stackoverflow.com/questions/27510462/how-can-i-remove-double-line-breaks-with-sed - on the popular first, although the original question arguably is the same as mine, its accepted and most upvoted question removes all empty lines, not just "when there are 2 or more together" like the question asked. Some comments complain that that answer and others behave that way, but no answers are given to leave a single empty line be. Some other answers turn duplicate empty lines into a single empty line (squeezing), rather than removing them entirely.
I'm looking for a scriptable way to remove back to back empty lines, but leave single empty lines there.
I'm looking to automatically clean up .srt
(subtitle) files. The format requires newlines to be between subtitle sections (what to display at a particular amount of time.) Usually, if there's 2 lines to be displayed at once, the subtitle author just has the 2 lines. There's another style that some authors use of placing 2 empty lines between the lines to be displayed. On my device, this has the effect of displaying the first line only, and presumably rendering the second line off the TV.
So, I'd like to change this:
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here
2
...
Into this:
1
00:00:01,800 --> 00:00:03,802
First line is here
Second line is here
2
...
Not that it probably needs to be handled differently, but the file format requires there be an empty line at the bottom of the file, which must be left there.
I want this to work probably by first removing trailing whitespace, then only removing all empty lines that touch another empty line. I don't want it to be anchored based off the rest of the format of a .srt
, like having to do with how many lines are between numbered sections. (I've thought that all empty lines could be removed, and newlines could be added back in on lines containing only numerical characters, but I'm hoping to keep it more generic than that, ignoring the actual .srt
format.)
Also, if for some reason a .srt
has more than 2 lines of text, I'd like it left that way.
So, perhaps something along the lines of:
cat some.srt | sed 's/[ \t]*$//' | SOMETHING_ELSE
I'd prefer a bash
, sed
, or awk
solution over a perl
one. If I understand right, I think awk
will be easier to implement it in rather than sed
, being multi-line.
If I understood right, this
sed
script would worksed -r ':a;N;${:b;s/\n[[:blank:]]+\n/\n\n/;tb;s/\n{3,}/\n/g;s/\n+$/\n/};ba'
. – Paulo – 2019-01-24T13:59:33.847