Compressing RLE data to draw ASCII art

11

2

This question is based on what I came up with to answer another question.

Sometimes the questions here ask to draw some ASCII art. One simple way to store the data for the art is RLE (run-length encoding). So:

qqqwwwwweeerrrrrtttyyyy

becomes:

3q5w3e5r3t4y

Now to draw a big ASCII art you maybe be getting data like this (ignoring the new line characters):

19,20 3(4)11@1$20 11@19,15"4:20 4)19,4:20 11@
   ^^^
   Note that this is "20 whitespaces"

(Character count: 45)

The characters used for the ASCII art are never going to be lowercase or uppercase letters or numbers, only signs, marks and symbols but always in the printable ASCII character set.

You want to save some space in that string, so you replace the numbers with the uppercase character set (being 'A' equals to 1, 'B' equals to 2 until 'Z' equals to 26), because you are never going to get more than 26 repetitions of a character. So you get:

S,T C(D)K@A$T K@S,O"D:T D)S,D:T K@

(Character count: 34)

And finally you notice that some groups of (letter+symbol) are repeating, so you substitute the groups that appear 3 times or more in the string by the lowercase character set, in order or appearance in the string, but storing in a buffer the substitutions made (in the format "group+substitution char" for each substitution), and leaving the rest of the string as is. So the following groups:

S, (3 times) 
T  (4 times)
K@ (3 times)

gets substituted by 'a', 'b' and 'c', respectively, because there are never going to be more than 26 groups repeating. So finally you get:

S,aT bK@c
abC(D)cA$bcaO"D:bD)aD:bc

(Character count: 9+24=33)

[The last step only saves 1 byte because the groups that actually save characters after being substituted are the ones appearing 4 times or more.]

The challenge

Given a string containing the RLE data to draw an ASCII art (with the restrictions proposed), write the shortest program/function/method you can in order to compress it as described. The algorithm must print/return two strings: the first one containing the dictionary used for the compression, and the second one being the resulting compressed string. You may return the strings as a Tuple, an array, List or whatever, in the given order.

Note that if the string cannot be compressed in the step 2, the algorithm must return an empty string as first return value and the result of the step 1 as second return value.

You do not need to include the result of step 1 in the output values, I just include them in the examples for clarification purposes.

This is , so may the shortest answer for each language win!

Another test case

Input:                   15,15/10$15,15/10"10$10"10$10"10$10"15,15/

Output of step 1:        O,O/J$O,O/J"J$J"J$J"J$J"O,O/

Final algorithm output:  O,aO/bJ$cJ"d
                         abcabdcdcdcdab

---

Input:                   15,15/10$15,15/10"

Output of step 1:        O,O/J$O,O/J"

Final algorithm output:  <empty string>
                         O,O/J$O,O/J"

Charlie

Posted 2017-06-21T10:29:04.117

Reputation: 11 448

1because you are never going to get more than 26 repetitions of a character Nope. aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa – Okx – 2017-06-21T10:50:10.537

@Okx That can never be the case. – Erik the Outgolfer – 2017-06-21T10:57:08.593

@Okx yes, in the real world. The rules are made up for a restricted set of ASCII art, though. – Charlie – 2017-06-21T10:57:21.867

2In a real implementation, S,aT bK@c would probably be stored as just S,T K@ without explicitly naming the substitution characters which can be trivially deduced from that. – Arnauld – 2017-06-21T11:29:03.780

@Arnauld you are totally right, I missed that, but I am going to leave the question as is, just in case anybody has started to write his/her answer. – Charlie – 2017-06-21T11:31:21.953

when I was going to hit the "add answer" button, you added the empty string rule @.@ – Rod – 2017-06-21T12:29:13.013

Answers

3

JavaScript (ES6), 168 167 bytes

Returns an array of two strings: [dictionary, compressed_string].

s=>[(a=(s=s.replace(/\d+/g,n=>C(n|64),C=String.fromCharCode)).match(/../g)).map(v=>s.split(v)[a[v]||3]>=''?D+=v+(a[v]=C(i++)):0,i=97,D='')&&D,a.map(v=>a[v]||v).join``]

Test cases

let f =

s=>[(a=(s=s.replace(/\d+/g,n=>C(n|64),C=String.fromCharCode)).match(/../g)).map(v=>s.split(v)[a[v]||3]>=''?D+=v+(a[v]=C(i++)):0,i=97,D='')&&D,a.map(v=>a[v]||v).join``]

console.log(f('19,20 3(4)11@1$20 11@19,15"4:20 4)19,4:20 11@'))
console.log(f('15,15/10$15,15/10"10$10"10$10"10$10"15,15/'))

Arnauld

Posted 2017-06-21T10:29:04.117

Reputation: 111 334

3

Python 2, 269 280 268 266 bytes

Nothing fancy going on here. Good opportunity to use some simple regular expressions.

The first version failed for strings containing special characters that were interpreted within the regex. Second version (using re.escape) works with all the test cases. That correction cost 11 bytes.

Second version did not assign substitution characters in order, as required in the problem specification, and as pointed out by @CarlosAlejo. So, back to the drawing board.

Corrected version, further golfed

  • -6 bytes saved by not printing output on two lines
  • +3 bytes: Switching to code substitutions via a string to allow meeting the challenge as specified.
  • -4 bytes: Since I'm no longer calling re.findall twice, I don't need to rename it
  • -5 bytes: switching from for loops to while loops.
  • -2 bytes thanks to @Comrade Sparkle Pony
import re
S=re.sub
b=a=input()
for i in re.findall('\d{1,2}',a):
 b=S(i, chr(64+int(i)),b)
n,s,p=96,'',0
while p<len(b):
 c=b[p:p+2];f=b.count(c)
 if f>2and not c in s:n+=1;s+=c+chr(n)
 p+=2
p=0
while p<len(s):k=s[p:p+2];v=s[p+2];b=S(re.escape(k),v,b);p+=3
print s,b

Try it online!

CCB60

Posted 2017-06-21T10:29:04.117

Reputation: 159

You're almost there, note that the groups in second step are not created in the proper order (see example). Groups must be created in order of appearance, so the first one should be O,a. – Charlie – 2017-06-22T05:07:48.503

@CarlosAlejo I had not noted that as a requirement, since the substitutions are arbitrary, from a functional point of view. Python's default dictionaries, a natural way to implement this, are unordered. Will have to consider other possible data structures.... – CCB60 – 2017-06-22T10:34:37.627

Couldn't you save some bytes by using b=a=input() and n,s,p=96,'',0? – Comrade SparklePony – 2017-06-22T20:48:48.830

\d+ would be a shorter regex to use. You're never going to go over 26 anyways so there's no reason to ensure that it's specifically 1-2 digits. Also, using re.escape means that a basic string replace ends up slightly shorter: 253 bytes – Value Ink – 2019-05-22T03:36:13.500

0

Lua, 215 bytes

Just a good bit of pattern matching.

I think Lua's underrated when it comes to golfing... look at all those statements squished together!

g,c=string.gsub,string.char
u=g(arg[1],"%d%d?",function(n)return c(n+64)end)l,d=97,""g(u,"..",function(m)n,e=0,g(m,".", "%%%0")g(u,e,function()n=n+1 end)if n>2 then
l,s=l+1,c(l)d,u=d..m..s,g(u,e,s)end
end)print(u,d)

Trebuchette

Posted 2017-06-21T10:29:04.117

Reputation: 1 692

0

Python 2, 186 bytes

from re import*
S=sub('\d+',lambda m:chr(int(m.group(0))+64),input())
Q=[]
for p in findall('[A-Z].',S):
 if S.count(p)>2:a=chr(len(Q)+97);Q+=[p+a];S=sub(escape(p),a,S)
print''.join(Q),S

I was hoping to finally find use for re.subn :C

# first step - convert all numbers to uppercase letters
S=sub('\d+',lambda m:chr(int(m.group(0))+64),input())
# empty list to hold encoding of second step
Q=[]
# find every encoded pair (uppercase letter and some char)
for p in findall('[A-Z].',S):
 # if it occures 3 or move times
 if S.count(p)>2:
  # get lowercase letter to substitute with
  a=chr(len(Q)+97)
  # store encoding into list
  Q+=[p+a]
  # update string - substitute pair with lowercase letter
  S=sub(escape(p),a,S)
# output
# encodings of second step, space, result
# if nothing was compressed at step 2, space would prepend result (of step 1)
print''.join(Q),S

Compressed at step 2

Not compressed at step 2


Python 2, 246 bytes

Whole second step done in repl lambda of re.sub. Just for fun.

from re import*
Q=[]
S=sub('\d+',lambda m:chr(int(m.group(0))+64),input())
S=sub('[A-Z].',lambda m:(lambda m:S.count(m)>2and(m in Q or not Q.append(m))and chr(Q.index(m)+97)or m)(m.group(0)),S)
print''.join(Q[i]+chr(i+97)for i in range(len(Q))),S

Try it online!

Dead Possum

Posted 2017-06-21T10:29:04.117

Reputation: 3 256

0

Perl 5 -pl, 81 bytes

s/\d+/chr$&+64/ge;$b=a;for$a(/([A-Z].)(?=.*\1.*\1)/g){s/\Q$a/$b/g&&($\.=$a.$b++)}

Try it online!

Prints the encoded string on the first line, the triples on the second line

Xcali

Posted 2017-06-21T10:29:04.117

Reputation: 7 671

0

Ruby -p, 133 bytes

gsub(/(\d+)(.)/){(64+$1.to_i).chr+$2}
c=?`;s=''
$_.scan(/(..)(?=.*\1.*\1)/){s+=$&+c.succ!if !s[$&]}
puts s.scan(/(..)(.)/){gsub$1,$2}

Try it online!

Value Ink

Posted 2017-06-21T10:29:04.117

Reputation: 10 608