Double run-length encoding

9

1

Everyone knows what run-length encoding is. It has been the subject of many code-golf challenges already. We'll be looking at a certain variation.

Example

Normal: 11222222222222222222233333111111111112333322
Run-length: 112(19)3(5)1(11)2333322

The number in parentheses specifies the number of times the previous symbol occurred. In the example, only runs of 5 or more characters were encoded. This is because encoding runs of 4 or less doesn't improve the character count.

Challenge

Write a function/program that implements this variation of run-length encoding, but can also encode runs of two symbols. The runs of two symbols must also be enclosed in parentheses. A group will also be enclosed in parentheses. Your program must accept a string as input, and output the modified string with modifications that shorten the string.

Example

Normal: 111244411144411144411167676767222222277777222222277777123123123123
Double run-length: 1112((444111)(3))67676767((2(7)7(5))(2))123123123123

Notes

  • 111 was not encoded because encoding it (1(3)) is not shorter.
  • The string 444111 occurs 3 times so it is encoded.
  • 676767 was not encoded because ((67)(4)) is longer than before.
  • 222222277777222222277777 was not encoded as ((222222277777)(2)). Why? Because 222222277777 itself can be reduced to 2(7)7(5).
  • 123123123123 isn't encoded because your program is supposed to handle runs of two symbols, not three.

This is so shortest code wins. Tie-breaker is early submission.


If I missed anything, or if you are unsure of anything please notify me in the comments.

ericw31415

Posted 2016-05-06T18:43:57.970

Reputation: 2 229

But there are 4 67s. – Leaky Nun – 2016-05-06T18:58:00.303

Will we have to handle 441444144414 -> ((4414)(3))? – Leaky Nun – 2016-05-06T18:58:58.887

I have fixed it. – ericw31415 – 2016-05-06T18:59:08.547

@KennyLau No, you will not. 4414 is technically a series of 4. My wording is just bad. – ericw31415 – 2016-05-06T18:59:56.880

Can 111111111 be encoded as (1)(9)? – CalculatorFeline – 2016-05-06T21:44:53.190

It should be encoded as 1(9), because it's shorter than (1)(9). – ericw31415 – 2016-05-06T22:14:28.573

((4(9)1)(2)) or 4(9)14(9)1? – Leaky Nun – 2016-05-07T02:26:18.983

The second one. Remember, pick whichever is shorter. – ericw31415 – 2016-05-07T10:22:24.487

You might want to add more testcases (hint: they are in my solution) – Leaky Nun – 2016-05-07T10:30:44.880

Um, (67(4)) [7 characters] is definitely shorter than 67676767 [8 characters]. – Dúthomhas – 2016-05-07T10:48:01.313

(67(4)) implies 67777. – ericw31415 – 2016-05-07T11:10:25.497

Yoinks! (I need sleep, apparently.) – Dúthomhas – 2016-05-07T11:15:21.883

@ppperry I believe that should be (4(1111)). – ericw31415 – 2017-08-22T21:26:14.473

Answers

2

Retina, 162 bytes

+{`((\d)\2*(?!\2)(\d)\3*|\d)(?<1>\1)+
<<$1><$#1>>
<<([^<>]{1,7})><2>>
$1$1
<<([^<>]{1,3})><3>>
$1$1$1
<<([^<>]{1,2})><4>>
$1$1$1$1
}`<<(.)><(\d+)>>
$1($2)
T`<>`()

Try it online!

Leaky Nun

Posted 2016-05-06T18:43:57.970

Reputation: 45 011

If you input 10101010100100100100100, the output is ((10)(5))0((100)(4)), yet ((10)(4))((100)(5)) would be one character shorter. – Marv – 2016-05-08T01:38:35.337

Do you really have to use such marginal testcases.... – Leaky Nun – 2016-05-08T01:41:31.777

Yes, thats all the fun! :^) – Marv – 2016-05-08T01:46:49.090

It's funny how this is the only answer currently here. – ericw31415 – 2016-05-23T23:50:10.520