19
8
What general tips do you have for golfing in sed? I'm looking for ideas which can be applied to code-golf problems and which are also at least somewhat specific to sed (e.g. "remove comments" is not an answer).
Please post one tip per answer.
19
8
What general tips do you have for golfing in sed? I'm looking for ideas which can be applied to code-golf problems and which are also at least somewhat specific to sed (e.g. "remove comments" is not an answer).
Please post one tip per answer.
11
If you need to use labels then for sure you'll want your label names to be as short as possible. In fact taken to the extreme, you may even use the empty string as a label name:
: # define label ""
p # print pattern space
b # infinite loop! - branch to label ""
4As of gnu sed 4.3, this behavior was removed. :
now requires a label. – Kevin – 2017-02-17T20:06:49.487
Indeed, here is also the actual git commit link. I guess for PPCG this won't change much, since we are allowed to post answers for GNU sed 4.2.x, but it's good to know, though regrettably, that this trick won't officially work anymore.
– seshoumara – 2017-02-17T20:40:52.0608
The GNU sed documentation describes the s
command as "sed's Swiss Army Knife". But if all you want to do is replace all instances of one character with another, then the y
command is what you need:
y/a/b/
is one char shorter than:
s/a/b/g
its also way faster, and can swap chars in place: y/12/21/
– mikeserv – 2015-12-23T18:12:12.260
6
When repeatedly replacing in a loop:
loop:
s/foo/bar/g
tloop
it's usually unnecessary to replace globally, as the loop will eventually replace all occurrences:
# GNU sed
:
s/foo/bar/
t
Note also the GNU extension above: a label can have an empty name, saving more precious bytes. In other implementations, a label cannot be empty, and jumping without a label transfers flow to the end of script (i.e. same as n
).
1The empty label name is GNU-specific, POSIX requires branches with no argument to jump to the end of the script (seems to be the behavior in the BSDs and Busybox, also in GNU sed if you don't add an empty :
) – ninjalj – 2015-11-03T01:30:08.767
2
The nameless label was always a bug in GNU sed, not an extension, and in version 4.3 and higher this bug was, regrettably, fixed. See here.
– seshoumara – 2017-02-17T20:45:23.0476
Consider using extended regex syntax (in GNU sed). The -r
option costs one byte in scoring, but using it just once to eliminate the backslashes from a pair of \(...\)
has already paid for itself.
2With the additional note that -r
seems to be GNU sed
specific. – manatwork – 2015-06-05T11:28:19.823
@manat - added (but it's a Community Wiki answer, so you could have edited yourself). – Toby Speight – 2015-06-05T11:51:57.517
Of course. I just didn't considered it part of the tip, only an additional note. – manatwork – 2015-06-05T11:55:17.033
And it keeps paying for itself when using +
, ?
, {}
and |
in regex matches, since no backslashes are needed either. – seshoumara – 2016-08-29T17:52:41.440
-E
works as an alias to -r
in many sed
implementations if I remember correctly. – phk – 2019-02-28T15:08:39.557
5
There's no built-in arithmetic, but calculations can be done in unary or in unary-coded decimal. The following code converts decimal to UCD, with x as the unit and 0 as the digits separator:
s/[1-9]/0&/g
s/[5-9]/4&/g
y/8/4/
s/9/4&/g
s/4/22/g
s/[37]/2x/g
s/[26]/xx/g
s/[1-9]/x/g
and here's the conversion back to decimal:
s/0x/-x/g
s/xx/2/g
y/x/1/
s/22/4/g
s/44/8/g
s/81/9/g
s/42/6/g
s/21/3/g
s/61/7/g
s/41/5/g
s/-//g
These are both taken from an answer to "Multiply two numbers without using any numbers".
Plain old unary can be converted using this pair of loops from this answer to "{Curly Numbers};", where the unit is ;
. I've used v
and x
to match Roman for 5
and 10
; b
comes from "bis".
# unary to decimal
:d
/;/{
s/;;;;;/v/g
s/vv/x/g
/[;v]/!s/x\+/&0/
s/;;/b/g
s/bb/4/
s/b;/3/
s/v;/6/
s/vb/7/
s/v3/8/
s/v4/9/
y/;bvx/125;/
td
}
# Decimal to unary
:u
s/\b9/;8/
s/\b8/;7/
s/\b7/;6/
s/\b6/;5/
s/\b5/;4/
s/\b4/;3/
s/\b3/;2/
s/\b2/;1/
s/\b1/;0/
s/\b0//
/[^;]/s/;/&&&&&&&&&&/g
tu
1...and if you have to use either of these, you've almost certainly already lost the code golf, though you might still be competitive with Java answers ;-) Still fun to use though. – Digital Trauma – 2015-06-05T17:36:30.153
The conversion from plain unary to decimal gives wrong answers for unary input equivalent of decimal form X0X, for example 108. The line responsible for this is /[;v]/!s/\b/0/2
, which needs to be changed to /[;v]/!s:x\+:&0:
for it to work. See here.
@seshoumara, your link seems to be an empty page. But it's entirely plausible that I made an error when extracting that code from the referenced answer, so I'll just apply your fix. – Toby Speight – 2017-04-06T14:48:52.963
The link loads correctly, but I was expecting something other than a grey page with "TIO" and something that looks like the Ubuntu logo - is that what's intended? And I was referring to the second of the answers I referenced (58007), as that's where the plain-unary sample originated.
– Toby Speight – 2017-04-06T15:30:33.950The TIO link should have contained the corrected code, plus an example input, 108 in unary. On running the code you should have seen the correct result 108, and not 180, as previously generated by that now fixed line of code. Updating the referenced answer is entirely up to you. This is a community wiki. – seshoumara – 2017-04-06T15:52:49.420
4
Expanding upon this tip answer, regarding the conversions between decimal and plain unary number formats, I present the following alternative methods, with their advantages and disadvantages.
Decimal to plain unary: 102 + 1(r flag) = 103 bytes. I counted \t
as a literal tab, as 1 byte.
h
:
s:\w::2g
y:9876543210:87654321\t :
/ /!s:$:@:
/\s/!t
x;s:-?.::;x
G;s:\s::g
/\w/{s:@:&&&&&&&&&&:g;t}
Advantage: it is 22 bytes shorter and as extra, it works with negative integers as input
Disadvantage: it overwrites the hold space. However, since it's more likely that you'd need to convert the input integer right at the start of the program, this limitation is rarely felt.
Plain unary to decimal: 102 + 1(r flag) = 103 bytes
s:-?:&0:
/@/{:
s:\b9+:0&:
s:.9*@:/&:
h;s:.*/::
y:0123456789:1234567890:
x;s:/.*::
G;s:\n::
s:@::
/@/t}
Advantage: it is 14 bytes shorter. This time both tip versions work for negative integers as input.
Disadvantage: it overwrites the hold space
For a complicated challenge, you'll have to adapt these snippets to work with other information that may exist in the pattern space or hold space, besides the number to convert. The code can be golfed more, if you know you only work with positive numbers or that zero alone is not going to be a valid input / output.
An example of such challenge answer, where I created and used these snippets, is the Reciprocal of a number (1/x).
For unary-to-decimal you can save two bytes by combining the last two substitutions: s:\n|@$::g
. https://tio.run/##K05N@f@/2ErX3krNwIpL30G/2oqr2ComyVLbykANxNSz1HKw0gcyM6yBHC19KyuuSisDQyNjE1MzcwtLKzgLqL0CqERfTwuoxB3IismrcVCxskoHmVpS@/@/Awj8yy8oyczPK/6vWwQA
I had my own try at the decimal to unary converter. Here's 97 bytes :) Try it online! (also doesn't require -r
, but with new consensus, flags do not count towards the bytecount anyways, and it doesn't mess up the hold space)
Actually if you change the last line from /\n/ta
to /\n/t
, you save 1 byte to get 96 – user41805 – 2018-05-22T10:22:57.090
@Cowsquack Thanks, 96 is great! Don't have time now, will look on it this weekend. – seshoumara – 2018-05-22T15:05:11.613
Sure, do send me a ping on chat then :) – user41805 – 2018-05-22T16:37:57.037
4
If not explicitly banned by the question, the consensus for this meta question is that numerical input may be in unary. This saves you the 86 bytes of decimal to unary as per this answer.
Isn't that meta consensus for sed referring to plain old unary format? I have several answers where an input in UCD would help me, in case it's either way. – seshoumara – 2017-02-15T08:32:18.120
@seshoumara I meant unary, not UCD – Digital Trauma – 2017-02-15T16:07:10.600
Then the conversion from decimal to plain old unary saves you 126 bytes as per that answer you linked. The 86 bytes is for the conversion to UCD. – seshoumara – 2017-02-17T02:32:38.863
4
As mentioned in man sed
(GNU), you can use any character as a delimiter for regular expressions by using the syntax
\%regexp%
where %
is a placeholder for any character.
This is useful for commands like
/^http:\/\//
which are shorter as
\%^http://%
What is mentioned in the GNU sed manual but not in man sed
is that you can change the delimiters of s///
and y///
as well.
For example, the command
ss/ssg
removes all slashes from the pattern space.
3
Let's talk about the t
and T
commands, that although they are explained in the man page, it's easy to forget about it and introduce bugs accidently, especially when the code gets complicated.
Man page statement for t
:
If a
s///
has done a successful substitution since the last input line was read and since the last t or T command, then branch to label.
Example showing what I mean: Let's say you have a list of numbers and you want to count how many negatives there are. Partial code below:
1{x;s/.*/0/;x} # initialize the counter to 0 in hold space
s/-/&/ # check if number is negative
t increment_counter # if so, jump to 'increment_counter' code block
b # else, do nothing (start a next cycle)
:increment_counter
#function code here
Looks ok, but it's not. If the first number is positive, that code will still think it was negative, because the jump done via t
for the first line of input is performed regardless, since there was a successful s
substitution when we initialized the counter! Correct is: /-/b increment_counter
.
If this seemed easy, you could still be fooled when doing multiple jumps back and forth to simulate functions. In our example the increment_counter
block of code for sure would use a lot of s
commands. Returning back with b main
might cause another check in "main" to fall in the same trap. That is why I usually return from code blocks with s/.*/&/;t label
. It's ugly, but useful.
2
I know this is an old thread, but I just found those clumsy decimal to UCD converters, with almost a hundred bytes, some even messing the hold space or requiring special faulty sed
versions.
For decimal to UCD I use (68 bytes; former best posted here 87 bytes)
s/$/\n9876543210/
:a
s/\([1-9]\)\(.*\n.*\)\1\(.\)/\3x\2\1\3/
ta
P;d
UCD to decimal is (also 66 bytes; former best posted here 96)
s/$/\n0123456789/
:a
s/\([0-8]\)x\(.*\n.*\)\1\(.\)/\3\2\1\3/
ta
P;d
\n
in the replacement is not portable. You can use a different character instead and save two bytes, but you'll need more bytes to remove the appendix instead of P;d
; see next remark. Or, if your hold space is empty, do G;s/$/9876543210/
without byte penalty.s/\n.*//
instead of P;d
.sed
versionsThere are no decimal to UCD and back converters posted in this thread that mess the hold space or require faulty sed versions. – seshoumara – 2017-11-11T08:49:21.130
Your own answer from April 6th uses the gold space and will only run with old sed
versions that violate the POSIX standard. – Philippos – 2017-11-11T09:08:17.433
I'm not doing decimal to UCD conversions! Read the thread again carefully. UCD means that 12 is converted to 0x0xx (what your answer calculates), while plain unary (what my answer calculates) means that 12 is converted to xxxxxxxxxxxx. I chosed @ as symbol, but you get the idea. And further, on PPCG one doesn't need to adhere to the POSIX standard. – seshoumara – 2017-11-11T09:15:11.327
If it pleases you, sheriff – Philippos – 2017-11-11T09:55:32.207
2
-z
Often you need to operate on the whole input at once instead of one line at a time. The N
command is useful for that:
:
$!{N;b}
...but usually you can skip it and use the -z
flag instead.
The -z
flag makes sed use NUL (\0
) as its input line separator instead of \n
, so if you know your input won’t contain \0
, it will read all of the input at once as a single “line”:
$ echo 'foo
> bar
> baz' | sed -z '1y/ao/eu/'
fuu
ber
bez
2
The G
command appends a newline and the contents of the hold space to the pattern space, so if your hold space is empty, instead of this:
s/$/\n/
You can do this:
G
The H
command appends a newline and the contents of the pattern space to the hold space, and x
swaps the two, so if your hold space is empty, instead of this:
s/^/\n/
You can do this:
H;x
This will pollute your hold space, so it only works once. For two more bytes, though, you could clear your pattern space before swapping, which is still a savings of two bytes:
H;z;x
2
Instead of clearing the pattern space with s/.*//
, use the z
command (lowercase) if you go with GNU sed. Besides the lower bytes count, it has the advantage that it won't start the next cycle as the command d
does, which can be useful in certain situations.
1May also be of benefit if you have invalid multi-byte sequences (which aren't matched by .
). – Toby Speight – 2016-08-30T17:08:10.393
1
(thanks to Riley for discovering this from an anagol submission)
Here is an example where we are tasked with creating 100 @
s in an empty buffer.
s/$/@@@@@@@@@@/;s/.*/&&&&&&&&&&/ # 31 bytes
s/.*/@@@@@@@@@@/;s//&&&&&&&&&&/ # 30 bytes
The second solution is 1 byte shorter and uses the fact that empty regexes are filled in with the last encountered regex. Here, for the second substitution, the last regex was .*
, so the empty regex here will be filled with .*
. This also works with regexes in /conditionals/
.
Note that it is the previously encountered regex, so the following would also work.
s/.*/@@@@@@@@@@/;/@*/!s/$/@/;s//&&&&&&&&&&/
The empty regex gets filled with @*
instead of $
because s/$/@/
is never reached.
Yes, good answer. I've even made regexes longer so that they can be re-matched like this (thus making the program shorter).
– Toby Speight – 2018-06-22T11:53:03.4871
In sed, the closest thing to a function that you can have is a label. A function is useful because you can execute its code multiple times, thus saving a lot of bytes. In sed however you would need to specify the return label and as such you can't simply call this "function" multiple times throughout your code the way you would do it in other languages.
The workaround I use is to add in one of the two memories a flag, which is used to select the return label. This works best when the function code only needs a single memory space (the other one).
Example showing what I mean: taken from a project of mine to write a small game in sed
# after applying the player's move, I overwrite the pattern space with the flag "P"
s/.*/P/
b check_game_status
:continue_turn_from_player
#code
b calculate_bot_move
:return_bot_move
# here I call the same function 'check_game_status', but with a different flag: "B"
s/.*/B/
b check_game_status
:continue_turn_from_bot
#code (like say 'b update_screen')
:check_game_status # this needs just the hold space to run
#code
/^P$/b continue_turn_from_player
/^B$/b continue_turn_from_bot
The labels should be golfed of course to just one letter, I used full names for a better explanation.
0
Mostly useless step:
y|A-y|B-z|
This will only translate A
to B
and y
to z
(... and -
to -
;), but nothing else,
so
sed -e 'y|A-y|B-z|' <<<'Hello world!'
will just return:
Hello world!
You could ensure this will be useless, for sample by using this on lower-case hexadecimal values (containing only 0
, 1
, 2
, 3
, 4
, 5
, 6
, 7
, 8
, 9
, a
, b
, c
, d
, e
or f
.)
2Is this something you found out the hard way?! ;-) – Toby Speight – 2015-09-15T19:06:22.290
I like useless scripts: sed '; ;/s/b;y|A-y|B-z|;s ;s/ //; ; ;' <<<'Hello world'
(Why do this not suppress the space?) – F. Hauri – 2015-09-15T19:19:00.167
4Not really a golfing tip (but still a tip for golfing): linefeeds consume just as many bytes as semicolons, so you can keep your code short and readable. – Dennis – 2015-07-22T17:48:31.577
Not a tip either, but a problem: I have GNU sed, yet the
F
command never worked. Does anyone know why? – seshoumara – 2016-08-30T16:15:44.630@seshoumara
F
works on my GNU sed (Debian testing). It just prints-
if reading from stdin, of course, but that's expected. What do you get fromsed -e 'F;Q' /etc/hostname
? – Toby Speight – 2016-08-30T16:28:37.640@TobySpeight That gives this error:
char 1: unknown command: F
. I have to update sed maybe; what version do you have? TheL
command also doesn't work, but it's useless anyway since-l n
exists. Everything else mentioned on GNU sed's site works. – seshoumara – 2016-08-30T16:44:20.453@seshoumara, my results are on
sed (GNU sed) 4.2.2
. Just to check, you don't havePOSIXLY_CORRECT
set in your environment, do you? That would turn off most GNU extensions. – Toby Speight – 2016-08-30T16:49:29.130@TobySpeight I have sed 4.2.1 so this could be why (updating). And no, I didn't had that set in my environment. Thanks for all the help today. – seshoumara – 2016-08-30T17:05:19.603
1I opened the chat room
bash, sed and dc
for all who want to talk and ask about these languages. Let's make a community! – seshoumara – 2016-08-30T17:10:40.947