Rosetta Stone Challenge: Run-Length Encoding v2.0

8

1

The goal of a Rosetta Stone Challenge is to write solutions in as many languages as possible. Show off your programming multilingualism!

The Challenge

We've done run-length encoding before but only considered runs of a single character. Of course, we can sometimes make even greater savings if we consider runs of multiple characters.

Take aaaxyzxyzxyzdddd for instance. This can be compressed to 3a3{xyz}4d. Your task is to write a program of function which, given a case-sensitive string of letters and spaces, compresses it optimally using run-length encoding for multi-character runs.

You may take input via function argument, STDIN or ARGV and return the result or print it to STDOUT.

Runs must not be nested, so aaabbbaaabbb must be 3a3b3a3b, not 2{3a3b}. I.e. the string should be encoded (or decoded) in a single pass.

Consequences of Optimal Compression

Some examples where naive application of run-length encoding might lead to suboptimal results:

  • abab must not be "compressed" to 2{ab}
  • aaaabcdeabcde must not be compressed to 4abcdeabcde but 3a2{abcde} instead.

If there are two optimal versions (e.g. aa and 2a or abcabc and 2{abc}) either result is fine.

Examples

Input              Output

aa                 aa -or- 2a
aaaaaAAAAA         5a5A
ababa              ababa
abababa            a3{ba} -or- 3{ab}a
foo foo bar        2{foo }bar
aaaabcdeabcde      3a2{abcde}
xYzxYz             xYzxYz -or- 2{xYz}
abcabcdefcdef      abcab2{cdef}
pppqqqpppqqq       3p3q3p3q
pppqqqpppqqqpppqqq 3{pppqqq}

Scoring

Each language is a separate competition as to who can write the shortest entry, but the overall winner would be the person who wins the most of these sub-competitions. This means that a person who answers in many uncommon languages can gain an advantage. Code golf is mostly a tiebreaker for when there is more than one solution in a language: the person with the shortest program gets credit for that language.

If there is a tie, the winner would be the person with the most second-place submissions (and so on).

Rules, Restrictions, and Notes

Please keep all of your different submissions contained within a single answer.

Also, no shenanigans with basically the same answer in slightly different language dialects. I will be the judge as to what submissions are different enough.

Current Leaderboard

This section will be periodically updated to show the number of languages and who is leading in each.

  • C# (265) — edc65
  • JavaScript (206) — edc65
  • Python (214) — Will
  • VB.NET (346) — edc65

Current User Rankings

  1. edc65 (3)
  2. Will (1)

Martin Ender

Posted 2014-09-19T16:27:01.727

Reputation: 184 808

Answers

1

JavaScript (E6) 206

Almost sure it is optimal. I start trying to encode the longest string with the max gain, and recursively encode what remains at left and right. After each try I keep the shortest result.

Golf notes (JS specific)

  • Pseudo arguments instead of locals (for recursion)
  • Compare lengths using out of bounds index that gives 'undefined'
R=(s,n=s,d=s.length>>1,w,i,j,c)=>{
  for(;d;--d){
    for(i=-1;s[d+d+i++];){
      w=s.slice(i,j=i+d);
      for(r=1;w==s.substr(j,d);j+=d)++r;
      c=d>1?r+'{'+w+'}':r+w,
      // c[d*r-1]|| /* uncomment this line (+10 bytes) to have faster response
      n[(c=R(s.slice(0,i))+c+R(s.slice(j))).length]&&(n=c)
    }
  }
  return n
}

Test In FireFox/FireBug console

;['aa','aaaaaAAAAA','ababa','abababa','foo foo bar', 'aaaabcdeabcde',
  'xYzxYz', 'abcabcdefcdef', 'pppqqqpppqqq','pppqqqpppqqqpppqqq']
.forEach(x=>console.log(x+' -> '+R(x)))

Output

aa -> aa
aaaaaAAAAA -> 5a5A
ababa -> ababa
abababa -> 3{ab}a
foo foo bar -> 2{foo }bar
aaaabcdeabcde -> 3a2{abcde}
xYzxYz -> xYzxYz
abcabcdefcdef -> abcab2{cdef}
pppqqqpppqqq -> 3p3q3p3q
pppqqqpppqqqpppqqq -> 3{pppqqq}

C# (.NET 4.0) 265

Exactly the same algorithm

string R(string s)
{
  string n=s,c,w;
  int l=s.Length,d=l/2,i,j,r;
  for(;d>0;--d)
  {
    for(i=0;d+d+i<=l;i++)
    {
      w=s.Substring(i,d);
      for(j=i+d,r=1;j+d<=l&&w==s.Substring(j,d);j+=d)++r;
      c=d>1?r+"{"+w+"}":r+w;
      // if(c.Length<d*r){ // uncomment this line for faster execution
        c=R(s.Substring(0,i))+c+R(s.Substring(j));
        n=c.Length<n.Length?c:n;
      //} // and this one of course
    }
  }
  return n;
}

Test in LinqPad, mode 'C# program', add a main after the R function

void Main()
{
  string[] test = {"aa","aaaaaAAAAA","ababa","abababa","foo foo bar","aaaabcdeabcde",
  "xYzxYz","abcabcdefcdef","pppqqqpppqqq","pppqqqpppqqqpppqqq"};
  test.Select(x => new { x, r=R(x) }).Dump("Result");
}

Same Output

VB.Net (.NET 4) 346 (probably)

Same algorithm and same library, but the language is different enough

I can't be sure about byte count, as Visual Studio keep reformatting the code and adding spaces.

Golf notes: better use something else

function R(s as string)

  dim n=s,c,w
  dim l=s.Length,i,j,d
  for d=l\2 to 1 step-1
    for i=0 to l-d-d
      w=s.Substring(i,d)
      j=i+d
      r=1
      do while (j+d<=l andalso w=s.Substring(j,d))
        r=r+1
        j=j+d
      loop
      c=if(d>1,r & "{" & w & "}",r & w)
      if(c.Length<d*r) then
        c=R(s.Substring(0,i))+c+R(s.Substring(j))
        if c.Length<n.Length then n=c
      end if
    next
  next
  return n

end function

edc65

Posted 2014-09-19T16:27:01.727

Reputation: 31 086

You can undo the adding of spaces. If you type a character that makes it reformat hit ctrl-z to just undo the formatting. It means two characters instead of one every so often but it's better than removing all the spaces and linefeeds afterward. – Jerry Jeremiah – 2014-09-23T03:45:21.957

3

Python 214

def t(s):
 b=s;l=len;r=range(1,l(s))
 for i in r:
  p=s[:i];b=min(b,p+t(s[i:]),key=l);x=1
  for j in r:k=i*j+i;x*=p==s[i*j:k];b=min(b,(b,''.join((`j+1`,"{"[i<2:],p,"}"[i<2:],t(s[k:]))))[x],key=l)           
 return b

(second level indent is tab)

As this is , this is the naive recursive approach without early-exit, so its really really slow.

Will

Posted 2014-09-19T16:27:01.727

Reputation: 1 143