Compressing String Arrays
UPDATE: The tools featured in this tip have since be rewritten, improved and integrated into my Japt interpreter. For the best results it is recommended that you use that compressor over any of those linked below. I'll revisit this tip when I have some more time and rewrite it with the new compressor in mind.
Introduction
If you have an array of strings in your code, the most obvious way to compress it would be to run each string through Oc
individually. For the purposes of this tip, we'll be working with the array ["lollipop","marshmallow","nougat","oreo"]
, which weighs in at 42 bytes initially. Running each string through Oc
gives us:
[`lo¥ipop`,`Ú\hÚaow`,`Í`,`eo`]
That's now 33 bytes, a decent saving.
Step 1
But, we can do better. If we join the array to a newline separated string, we can get rid of the brackets, commas, and extraneous backticks and split on newline to get our array. Applying that to our example array gives us the following:
`lo¥ipop
Ú\hÚaow
Í
eo`·
Down to 26 bytes now.
Step 2
But, we can do better still! We could use a lowercase letter to delimit the strings instead of a newline, which might get included in the compression. z
isn't used in any of our strings so let's drop that in, and see how we get on.
`lo¥ipopzÚ\hÚaowzÍzeo`qz
Ah, nuts - no improvement there; our byte count has gone up by one! There might be another letter you could use but, depending on your strings, there could be quite a few to try - in our example there are 11: b,c,d,f,j,k,q,v,x,y,z
. Trying each would be pretty tedious, which is where this handy tool comes in; feed it your newline separated strings and it will try to delimit the strings with each letter that's not contained in any of them and output:
- the shortest compressed string,
- the delimiter it uses, and
- its length.
Running our sample strings through it shows that b
gives the best results:
`lo¥ipáæqrÚaowbÍÞo`qb
And there you have it, we're down to just 24 bytes.
Step 3
But, we can do even better! If the order of strings in your array doesn't matter, maybe there's a different permutation combined with a different delimiter that could work out even shorter. Trying each possibility is going to be much more tedious, though. With our 4 strings, there are 24 different permutations to try. With each of the 11 possible letters that becomes 264! That's where this tool comes into play. Again, feed it your newline separated strings and it will try every combination of every permutation and every delimiting letter, outputting:
- the order of the strings in the shortest compressed string,
- the compressed string,
- the delimiter it uses, and,
- its length.
Running our sample strings through it shows that "nougat","oreo","lollipop","marshmallow"
with b
as a delimiter gives the best results, with a final byte count of just 23:
`ÍÞo½o¥ipáæqrÚaow`qb
Bonus Tip: Integer Array Compression
You can apply the same principle to arrays of integers by first converting each to a higher base. Using this sample, 36 byte array:
[588181,156859,595676,475330,680474]
We can get that down to 29 bytes by first converting it to an array of base 32 strings and then running it through the first compression programme:
`huclt4p5r5ÛÊg62tkogq`qt mnH
Or as low as 27 bytes using the second programme:
`4p5Ïcl5ÛÊg62tkogq`qt mnH
You might be able to save another byte or 2 on top of that by moving the integer conversion into a method you're already running on the array.
Notes
- Don't forget to factor in the 1 or 2 extra bytes
q<letter>(<space>)
costs over ·
. Although, you may be able to use one of the Unicode shortcuts to get a byte back, depending on your delimiter (qÊ
is the same as ql<space>
, for example).
- A word of caution when using the last tool: the more strings you have, the more permutations there will be and the slower the programme will run, until it eventually craps out. As detailed above, with our 4 sample strings and 11 possible letters to try, there are 264 possible combinations, increase the number of strings by just 1 with the same 11 letters and we already have 1320 combinations to try. (You can use this tool to count the number of combinations, if you want).
Credits
- Oliver for the inspiration to create the tools found in this tip.
- ETHproductions for proofreading.
Heh, thanks for posting this. I'd been holding off on doing this because I'd like to redesign Japt at some point, but that won't be happening anytime soon, and it probably won't mess up a lot of the tips anyway. Tip for myself: write a tutorial :P – ETHproductions – 2017-05-17T21:31:38.893
Don't forget to visit the Japt chatroom :)
– Oliver – 2017-05-18T14:53:05.877