Shorten text with Run Length Encoding

8

1

Shorten (or not) text using run length encoding

Input:

heeeello
woooorld

Output:

1h4e2l1o
1w4o1r1l1d
  • Read lines from stdin.
  • Print to stdout.
  • Stderr is of course discarded.
  • Assume there are hidden testcases (no embedding of the output)
  • Input/Output as ASCII
  • Any language is accepted

mroman

Posted 2012-09-14T09:01:18.217

Reputation: 1 382

3

You can (typically) save quite a bit if you ignore all ones, e.g. w4orld instead of 1w4o1r1l1d (you'd need to escape numerics, e.g. `f111 -> f3\1´). But then it would be a near-duplicate of this: http://codegolf.stackexchange.com/questions/6774

– primo – 2012-09-14T09:54:39.130

1

As it is it's close enough to Run-Length Encoding that I vote to close as dupe. It's not going to provide any new challenge or points of interest.

– Peter Taylor – 2012-09-14T12:15:25.910

Bonus points for whoever manages to find a fixpoint. – FUZxxl – 2012-09-15T18:11:03.840

Run-Length Encoding may be the same at its core but the input format and the required output format are very different. – mroman – 2012-09-15T18:16:57.603

4@FUZxxl, 22 is a trivial fixpoint. – Peter Taylor – 2012-09-17T18:15:22.293

2@PeterTaylor And the only nonempty one. We know it must begin with a digit. 11 is impossible. 22 must end there or be followed by another fixed point not beginning with 2. 333nnn is an impossible pattern, for you'll never find the same character at consecutive odd indices. 4444 and up fail for the same reason. – Khuldraeseth na'Barya – 2019-08-05T21:26:12.303

DIfferent input and output formats are not enough to render questions distinct. – pppery – 2019-08-08T03:27:30.900

Possible duplicate of Run-Length Encoding

– pppery – 2019-08-08T03:27:56.753

@pppery I prefer to close that one as a duplicate of this one, since this one is for text, not just numbers. – mbomb007 – 2019-08-08T14:40:05.420

Answers

2

Stax, 7 bytes

ûèB☼å°╤

Run and debug it st staxlang.xyz!

Unpacked (8 bytes) and explanation:

m|RFNppz
m           For each line of input:
 |R           Run-length encode: "heeeello" -> [[104,1],[101,4],[108,2],[111,1]]
   F          For each pair:
    N           Uncons-left: [104,1] -> push [104]; push 1
     ppz        Pop and print. Pop and print. Push "".
              Implicit print (always an empty string) with a newline

5 bytes, works only on a single line:

∩l↨me
|RmEp]    Unpacked
|R        Run-length encode: "heeeello" -> [[104,1],[101,4],[108,2],[111,1]]
  m       Map block over input:
   E        Explode array: [104,1] -> push 104, push 1
    p       Pop and print with no newline
     ]      Make a one-element list: 104 -> [104] (which is "h")
            Implicit print with newline

Run and debug it at staxlang.xyz!

Perhaps not legal. This program prints each pair on a line of its own. A bit sketchy.

If that output format is illegal, I give you 6 bytes:

╡δôZ→╬
|RFEp]p    Unpacked
  F        For each item in array, execute block:
      p      Pop and print with no newline
             No implicit print in for-each block, so no extra newlines

Run and debug it at staxlang.xyz!

The language's creator recursive points out that uncons-right (N) can shorten this to six bytes unpacked, as it handles the E and the ] on its own. Programs this short, however, often get no shorter when packed, and this is am example. Still six bytes: |RFNpp Edit: Had to update my main answer; this form is what I used.

Khuldraeseth na'Barya

Posted 2012-09-14T09:01:18.217

Reputation: 2 608

1NIcely done. |RFNpp can give the specified output in 6 bytes unpacked, but unfortunately, doesn't pack. – recursive – 2019-08-06T03:17:30.830

1@KevinCruijssen Yep. Whoops. – Khuldraeseth na'Barya – 2019-08-07T10:59:39.063

2

Perl: 46 → 36 or 27 characters

perl -pe's|((.)\2*)|@x=split//,$1;@x.$x[0]|eg'

All hail @ardnew for coming up with the idea of using the tr///c operator to count the number of characters in the matched string instead of splitting:

perl -pe's|((.)\2*)|$1=~y///c.$2|eg'

Degolfed:

while(defined($_ = <>)) {
  $_ =~ s{((.)\2*)}           # match 1 or more consecutive identical non-newlines
         {
           ($1 =~ y///c )     # count the number of characters in $1
           .                  # and concatenate it
           $2                 # with the first matched character
         }eg;                 # execute substitution, match "global"
  print $_;                   # print the modified line
}

Usage:

$ perl -pe's|((.)\2*)|$1=~y///c.$2|eg' infile

or via STDIN

$ perl -pe's|((.)\2*)|$1=~y///c.$2|eg'
heeeello

prints

1h4e2l1o

amon

Posted 2012-09-14T09:01:18.217

Reputation: 309

You're short-changing yourself on your character count - I count 37 characters including 1 for the p option. – Gareth – 2012-09-14T22:30:15.720

You can save 10 chars by using s|((.)\2*)|$1=~y///c.$2|eg, which sums to 27 total chars (using the same character counting rules as @Gareth) – ardnew – 2012-09-15T03:29:25.577

1

Can shorten even further to 25 bytes (including -p) by eliminating the outer parens: Try it online!

– Xcali – 2019-08-05T22:02:50.207

1

Brachylog, 11 bytes

ḅ⟨{lṫ}ch⟩ᵐc

Try it online!

(If output really has to be on stdout, add one byte for w at the end.)

          c    The output is the concatenation of
 ⟨    c ⟩ᵐ     the concatenated pairs of
  {lṫ}         length converted to a string
       h       and first element
ḅ        ᵐ     for every run in the input.

Unrelated String

Posted 2012-09-14T09:01:18.217

Reputation: 5 300

1

K (oK), 28 bytes

{,/($#:'c),'*:'c:(&~=':x)_x}

Try it online!

On mobile, explanation to follow...

streetster

Posted 2012-09-14T09:01:18.217

Reputation: 3 635

1

Python 3 iterative, 115 99 97 bytes

while 1:
 a=b='';k=0
 for c in input():e=a!=c;b+=(str(k)+a)*e;k+=1-k*e;a=c
 print(b[1:]+str(k)+a)

Try it online!

Python 3 recursive, 136 130 129 bytes

f=lambda r,c,s,k=1:s and(c==s[0]and f(r,c,s[1:],k+1)or f(r+str(k)+c,s[0],s[1:]))or r[1:]+str(k)+c
while 1:print(f('','',input()))

Try it online!

The iterative approach seems quite successful, while the recursive version has a lot of room for improvement.

movatica

Posted 2012-09-14T09:01:18.217

Reputation: 635

Nice approach! It looks like the while loop in your code is only there to demonstrate the input. Without it, your code is still valid. In that case, the loop does not need to be part of the code and you can reduce the first example to 85 bytes like so: Try it online!

– Jitse – 2019-08-07T09:09:48.667

Your second example can be reduced to 121 bytes like this: Try it online!

– Jitse – 2019-08-07T09:12:46.303

Yea, the while loop is only for linewise input. But the question requires to read all lines, not just one, so externalising the loop would be against the rules. – movatica – 2019-08-07T17:12:56.577

1

05AB1E, 9 bytes

|εÅγs.ιJ,

Try it online.

Or alternatively:

|εÅγøí˜J,

Try it online.

Explanation:

|          # Read all lines of input as list
 ε         # For-each over the lines:
  Åγ       #  Run-length encode, pushing the list of characters and lengths separately
    s      #  Swap so the characters at at the top and lengths below it
     .ι    #  Interleave the two lists
       J   #  Join the list of characters and lengths together to a single string
        ,  #  And output it with trailing newline

|εÅγ       # Same as above
    ø      #  Zip/transpose; creating pairs of [character, length]
     í     #  Reverse each pair to [length, character]
      ˜    #  Deep flatten the pairs to a single list
       J,  #  Join them together to a single string, and output it with trailing newline

Kevin Cruijssen

Posted 2012-09-14T09:01:18.217

Reputation: 67 575

1

Wolfram Language (Mathematica), 98 bytes

Print[""<>StringCases[#,s:x_..:>ToString@StringLength@s<>x]]&/@StringSplit[$ScriptInputString,"
"]

Try it online!

A more flexible I/O format reduces this solution to 54 bytes:

""<>StringCases[#,s:x_..:>ToString@StringLength@s<>x]&

Try it online!

Roman

Posted 2012-09-14T09:01:18.217

Reputation: 1 190

1

J, 35 31 characters

,(](":@#,{.);.1~1,2~:/\])1!:1[1

Usage:

   ,(](":@#,{.);.1~1,2~:/\])1!:1[1
heeeello
1h4e2l1o
   ,(](":@#,{.);.1~1,2~:/\])1!:1[1
woooorld
1w4o1r1l1d

Gareth

Posted 2012-09-14T09:01:18.217

Reputation: 11 678

Using modern site rules and a function, [:,(#,&":{.)/.~ for 15: Try it online!

– Jonah – 2019-08-07T23:32:55.297

0

Julia 1.1, 94 84 81 bytes

foldl(((n,l),c)->(c==l||print(n,l);((c==l&&n)+1,c)),readline()*'\n',init=("",""))

Try it online!

user3263164

Posted 2012-09-14T09:01:18.217

Reputation: 381

Here is 80 bytes. I think it can go shorter though

– H.PWiz – 2019-08-07T21:19:56.950

Unfortunately, your code didn't work on Julia 1.1 for me. I still managed to get 81 by adding an extra '\n' to readline() instead of printing the last tuple manually – user3263164 – 2019-08-08T14:51:20.963

0

Python 3, 84 bytes

def f(s,c=1):i,*j=s;b=j[:1]==[i];print(end='%s%s'%(c,i)*(b^1));f(j,1+b*c)
f(input())

Try it online!

Explanation

Checks if the first and second characters of the string are equal. If they are, increase the counter by 1. If they are not, print the counter and the first item and reset the counter to 1. In both cases, the function is called recursively with the first character removed.

Raises an error when end of string is reached.


Without I/O restrictions, but with minimal byte count:

Python 3.8 (pre-release), 70 bytes

f=lambda s:'%s%s'%(len(s)-len(t:=s.lstrip(p:=s[0])),p)+f(t)if s else''

Try it online!

Python 3 equivalent (77 bytes)

Explanation

Strips all repeating characters off the start of the string. Then it returns a string containing (1.) the difference in lengths between the original string and the stripped string; (2.) the first character of the original string; (3.) the result of the recursive function applied to the stripped string. Recursion ends when an empty string is encountered.

Jitse

Posted 2012-09-14T09:01:18.217

Reputation: 3 566

0

Retina, 12 bytes

(.)\1*
$.0$1

Try it online.

Explanation:

Get a part of 1 or more of the same character, capturing the character in capture group 1.

(.)\1*

Replace it with the length of the total match, concatted with the character from capture group 1:

$.0$1

Kevin Cruijssen

Posted 2012-09-14T09:01:18.217

Reputation: 67 575

0

Ruby, 42 bytes

->s{s.gsub(/(.)\1*/){|m|"#{m.size}"+m[0]}}

Try it online!

G B

Posted 2012-09-14T09:01:18.217

Reputation: 11 099

0

[Scala (shell)], 150 bytes

scala.io.Source.stdin.getLines.foreach(s=>println{val(x,y,z)=s.tail.foldLeft(("",s.head,1)){case((a,b,c),d)=>if(b==d)(a,b,c+1)else(a+c+b,d,1)};x+z+y})

Try it online!

Here the pure Lambda de-golfed (103 bytes):

  s => {
    val (x,y,z) = s.tail.foldLeft(("", s.head, 1)) {
      case ((a, b, c), d) =>
        if (b == d)
          (a, b, c + 1)
        else
          (a + c + b, d, 1)
    }
    x+z+y
  }

cubic lettuce

Posted 2012-09-14T09:01:18.217

Reputation: 181

0

Zsh, 70 bytes

try it online!

for ((n=1;i++<#w;n++))[[ $w[i] != $w[i+1] ]]&&printf $n$w[i]&&n=0
echo

This is a much golfier version of the earlier zsh answer (tio link). Could probably be golfed more using string=>array conversion instead of iteration.

roblogic

Posted 2012-09-14T09:01:18.217

Reputation: 554

0

CJam, 2 bytes

e`

e` is a built-in for run-length encoding. CJam's implicit output ignores array brackets, and so turns [[1 'h] [2 'e]] into "1h2e"

Try it online!

lolad

Posted 2012-09-14T09:01:18.217

Reputation: 754

0

rs, 19 chars

This doesn't really count because I created rs way after this was posted...but it was fun anyway!

(.)(\1*)/(^^\1\2)\1

Try it here!

kirbyfan64sos

Posted 2012-09-14T09:01:18.217

Reputation: 8 730

0

Zsh, 117

while read s;do n=1;for i in {1..$#s};do if [[ $s[i] != $s[i+1] ]];then echo -n $n$s[i];n=0;fi;((n++));done;echo;done

Run it like this:

zsh script.zsh < infile

De-golfed

while read s; do                      # while stdin has more
  n=1                                 # repeat counter
  for i in {1..$#s}; do               # for each character
    if [[ $s[i] != $s[i+1] ]]; then   # same as next one?
      echo -n $n$s[i]                 # print if no
      n=0
    fi
    ((n++))
  done
  echo                                # newline between words
done

Thor

Posted 2012-09-14T09:01:18.217

Reputation: 2 526

Are those white spaces necessary or can you shorten `if [' to 'if[' etc? – mroman – 2012-09-14T11:19:11.983

The [[ construct is a command on it's own (like [) and has to be separated from other commands. As to using [ over [[, it requires the arguments to be quoted so four " need to be added. – Thor – 2012-09-14T12:00:28.860

0

Bash: 104 characters

while read s;do e=;while [[ $s ]];do c=${s:0:1};n=${s##+($c)};e+=$[${#s}-${#n}]$c;s=$n;done;echo $e;done

Sample run:

bash-4.2$ while read s;do e=;while [[ $s ]];do c=${s:0:1};n=${s##+($c)};e+=$[${#s}-${#n}]$c;s=$n;done;echo $e;done <<END
heeeello
woooorld
END
1h4e2l1o
1w4o1r1l1d

manatwork

Posted 2012-09-14T09:01:18.217

Reputation: 17 865

0

Burlesque (17B)

{=[{J[-jL[Q}\m}WL

{=[{^^[~\/L[Sh}\m}WL

Older/Alternative and longer versions:

{=[{^^L[Sh\/-]Sh.+}m[\[}WL
{=[{^^L[Sh\/-][-.+}m[\[}WL
{=[{^^L[Sh\/-~.+}m[\[}WL
{=[{^^L[Sh\/-].+}\m}WL
{=[{^^L[Sh\/[~.+}\m}WL
{=[{^^L[Sh\/[~_+}\m}WL
{=[{^^L[Sh\/fc.+}\m}WL
{=[{^^L[Sh\/-~.+}\m}WL
{=[{^^L[Sh\/-]\/}\m}WL

mroman

Posted 2012-09-14T09:01:18.217

Reputation: 1 382

0

APL (24)

,↑{(⍕⍴⍵),⊃⍵}¨B⊂⍨B≠¯1⌽B←⍞

marinus

Posted 2012-09-14T09:01:18.217

Reputation: 30 224