Splitting strings into chunks of equal length n
As in most "normal" languages TMTOWTDI (there's more than one way to do it). I'm assuming here that the input doesn't contain linefeeds, and that "splitting" means splitting it into lines. But there are two quite different goals: if the length of the string isn't a multiple of the chunk length, do you want to keep the incomplete trailing chunk or do you want to discard it?
Keeping an incomplete trailing chunk
In general, there are three ways to go about the splitting in Retina. I'm presenting all three approaches here, because they might make a bigger difference when you try to adapt them to a related problem. You can use a replacement and append a linefeed to each match:
.{n}
$&¶
That's 8 bytes (or a bit less if n = 2
or n = 3
because then you can use ..
or ...
respectively). This has one issue though: it appends an additional linefeed if the string length is a multiple of the chunk length.
You can also use a split stage, and make use of the fact that captures are retained in the split:
S_`(.{n})
The _
option removes the empty lines that would otherwise result from covering the entire string with matches. This is 9 bytes, but it doesn't add a trailing linefeed. For n = 3
it's 8 bytes and for n = 2
it's 7 bytes. Note that you can save one byte overall if the empty lines don't matter (e.g. because you'll only be processing non-empty lines and getting rid of linefeeds later anyway): then you can remove the _
.
The third option is to use a match. With the !
option we can print all the matches. However, to include the trailing chunk, we need to allow for a variable match length:
M!`.{1,n}
This is also 9 bytes, and also won't include a trailing linefeed. This also becomes 8 bytes for n = 3
by doing ..?.?
. However note that it reduces to 6 bytes for n = 2
because now we only need ..?
. Also note that the M
can be dropped if this is the last stage in your program, saving one byte in any case.
Discarding an incomplete trailing chunk
This gets really long if you try to do it with a replacement, because you need to replace the trailing chunk with nothing (if it exists) and also with a split. So we can safely ignore those. Interestingly, for the match approach it's the opposite: it gets shorter:
M!`.{n}
That's 7 bytes, or less for n = 2
, n = 3
. Again, note that you can omit the M
if this is the last stage in the code.
If you do want a trailing linefeed here, you can get that by append |$
to the regex.
Bonus: overlapping chunks
Remember that M
has the &
option which returns overlapping matches (which is normally not possible with regex). This allows you to get all overlapping chunks (substrings) of a string of a given length:
M!&`.{n}
2
Related: Tips for regex golf
– Sp3000 – 2016-02-09T21:29:53.110Hmmm, I've been holding off posting this because Retina is still much in development and I was afraid most answers would end up being plain regex golfing tips, not very specific to Retina. But we might as well give it a go, I guess... :) – Martin Ender – 2016-02-09T21:32:16.013
@MartinBüttner You and some others have given me a lot of good tips and hints since I started looking at Retina, so I think its probably about time for this. I added a clarification that general regex tips should go to the linked question. – Digital Trauma – 2016-02-09T21:35:30.700
1@MartinBüttner Here is as good a place as any to ask - I've been wondering for a while - out of curiosity what is the inspiration for the name "Retina"? I assume the "Re" part is for Regular Expression, but what about the "tina"? – Digital Trauma – 2016-02-09T21:38:03.680
3@DigitalTrauma I was trying to come up with a nice word that would work as an acronym, but failed. The word "retina" was quite close to some of the attempts, and I liked the word. I never managed to retcon it into an acronym though and have since given up on that. So yeah the "re" is sort of for "regular expressions" and maybe the "n" for ".NET", but ultimately it's just a word that sounded nice. – Martin Ender – 2016-02-09T21:41:42.910