Compress my Malbolge!

-1

Background

Malbolge is truly intriguing language. But programs written in it are also extraordinairly huge (and in some cases, unpostable on PPCG).

Your challenge (shall you accept it) is, to compress and decompress given normalized Malbolge program. Devise your own lossless algorithm. You may compress your data further using an existing algorithm afterwards, but you must cite the algorithm used. You may use dictionaries.

Input

On the input, there will be given (in most cases very long) string, containing only following characters: i</*jpov. Example data:

iooooooooooooooooooooovoooioooipooooo<ooooooooooivojo<<<<<oio*ioooooooooooooooopiooooooooooooooooooo/jpooooooooooo

Example bigger chunk of data, 86 kilobytes big.

Output

Well, not very much to say about it, output depends on the algorithm.

Rules:

  • Input taken by your program, compressed, and then decompressed, shall be identical.
  • Submissions are scored by the following formula: \$ \frac{Compressed}{Original} \times 100\% \$, applied to the example 86 kilobytes big chunk.
  • If a submission abuses the fact that only one example chunk is provided (eg. by hardcoding it), I reserve right to add more example chunks.
  • Standard loopholes are forbidden, I/O rules apply.

Note: 7zip with PPMd algorithm manages to compress the code with 2,95% score.

Krzysztof Szewczyk

Posted 2019-08-30T14:18:02.253

Reputation: 3 819

1Is our submission supposed to include both a compressor and decompressor code? – AdmBorkBork – 2019-08-30T14:19:48.890

@AdmBorkBork yes. You may submit this as two programs, functions, whatever. – Krzysztof Szewczyk – 2019-08-30T14:20:33.273

4This could be a great challenge, but it's currently very flawed. The scoring method strongly encourages hardcoding, and "I reserve the right to retroactively make existing solutions invalid" isn't a good solution at all. Compression challenges typically solve this by counting the size of (compressed data + decompressor). – Grimmy – 2019-08-30T15:46:54.520

1@Grimy Well, you may be right. But I'd like to check what are the most efficient compressors community can make. I made myself one that has 1,1% score. If there was golfing need, the submissions would balance code size and the ratio, and I don't want THAT. – Krzysztof Szewczyk – 2019-08-30T16:14:15.787

2You think you don't want that, but without that, the score can be driven arbitrarily low by using larger and larger dictionaries, which is honestly quite boring. Or straight up 0% by hardcoding (you say you may add more test cases, but nothing prevents one from hardcoding those too). – Grimmy – 2019-08-30T16:25:53.553

@Grimy you're not right. People who put effort into their answers will downvote boring solutions (because well, they may better, but technically not, AND because they put a lot of time into their solution, and somebody has JUST came up to the party with simple and boring solution), punishing fun destroyers and effectively making them remove crappy answers. – Krzysztof Szewczyk – 2019-08-30T16:40:38.260

Related – Shaggy – 2019-08-30T17:15:00.350

5CGCC is not a popularity contest. Most answerers are not trying to gather upvotes, but to optimize for the question’s objective winning criterion. If that criterion makes it so that only boring solutions can win, that’s an issue with the question. – Grimmy – 2019-08-30T17:49:17.160

@KrzysztofSzewczyk, no I didn't, my phone did! :p – Shaggy – 2019-08-30T18:23:36.273

@Grimy why the popularity-contest tag tho? Who cares about being boring, I'm reaching half a hundred answers, every single one made in either Brainfuck or Malbolge – Krzysztof Szewczyk – 2019-08-30T18:35:43.993

Challenges on CGCC should be self-contained. You should add a description of what exactly each command does, and how normalised Malbolge interprets a program – Jo King – 2019-09-02T12:12:48.553

1@JoKing umm, what? Why should I. Does the information alter the way the code can be compressed? – Krzysztof Szewczyk – 2019-09-02T12:29:51.917

oh nvm, i was thinking this was a [tag:meta-golf] challenge, where the output had to still be runnable. Isn't this just a straight compression challenge that has nothing to do with Malbolge then? – Jo King – 2019-09-02T12:34:15.860

@JoKing sure! You're right, partially. – Krzysztof Szewczyk – 2019-09-02T13:52:00.343

Answers

2

Score with test 1: 38.6%, Score with test 2: 4.9%

Compressor

Jelly, 97 bytes

ṡ2ṢŒɠżQƊ1ị>ɗƇ3NÞḣ8Ṫ€,‘ɼ$€Ẏœṣ;@W}ɗjƭƒ
7©ṛ⁸“i</*jpov”iⱮ’Ç31¡
“i</*jpov”iⱮḅ8b256,ÇĖF€LÞḢL;L};÷/$ʋ,ḷɗ

Try it online!

Decompressor

Jelly, 44 bytes

ḅ256ḃ8
ḣ3s2ṚṣḢ}©¥jƭƒṫ5µṛ®n8Ɗ¿‘
ḢĿị“i</*jpov”

Try it online!

Explanation

Uses two different methods and picks the one with the shortest output.

Method 1: Convert the input from base 8 to base 256 Method 2: Repeatedly finds the 8 most common sublists of length 2 and replaces them with an integer between 8 and 255. The dictionary is stored at the beginning of the output.

The first byte of the output indicates the method used.

Input to compressor is a string of Malbolge. Output is a list of integers between 0 and 255. This could easily be encoded in Jelly’s code page if preferred. Input and output to/from decompressor mirror the compressor.

Nick Kennedy

Posted 2019-08-30T14:18:02.253

Reputation: 11 829

1

Retina, 47.25%

I know this is currently uninteresting, but this is still a WIP.

Cheap algorithm to compress Malbolge programs.

ooooooooo
a
oooooooo
b
ooooooo
c
oooooo
d
ooooo
e
oooo
f
ooo
g
j\*
h
jah
k

Decompressor:

a
ooooooooo
b
oooooooo
c
ooooooo
d
oooooo
e
ooooo
f
oooo
g
ooo
h
j\*
k
jah

Explanation

This simply shortens the source by replacing the o's, after noticing that there are too many o's in the Malbolge program. Try compressor online!

user85052

Posted 2019-08-30T14:18:02.253

Reputation:

I think you can possibly do better by replacing first oo with a, then aa with b, then bb with c and so on (or, possibly but very unlikely, the same but for runs of three) – my pronoun is monicareinstate – 2019-09-01T04:27:28.213

For some reason I actually got the compression ratio really high. – None – 2019-09-02T11:45:10.497

1

This code using standard Python libraries serves as a baseline for other advanced approaches.

Here is the outcomes for common compression algorithms.

Test case 0
Library   Score
gzip      50.000%
bz2       64.912%
lzma      87.719%

Test case 1
Library   Score
gzip      3.908%
bz2       2.968%
lzma      3.826%

PPMd      3.161%

NOTE: The PPMd algorithm score is obtained manually using 7zip to compress Test Case 1 into a zip file using a dictionary size of 16M and word size of 16 on Windows.

Joel

Posted 2019-08-30T14:18:02.253

Reputation: 1 691

You can try PPMd aswell, as it's really close to 1%. – Krzysztof Szewczyk – 2019-09-02T09:15:07.730

1

///, 0.00116%

Compressor (large test case)

Decompressor (large test case)

Compressor (small test case)

Decompressor (small test case)

Explanation: simple dictionary-based approach. The decompressor performs the same substitions as the compressor, but in reverse order. Note that both these programs take input by concatenating it into the source, as is usual for ///.

Grimmy

Posted 2019-08-30T14:18:02.253

Reputation: 12 521