Base85 Encoding

10

2

The Challenge

Write a program that can take an input of a single-line string containing any ASCII printable characters, and output the same string encoded in Base85 (using a big-endian convention). You can assume that the input will always be ≤ 100 characters.


A Guide to Base85

  • Four octets are encoded into (usually) five Base85 characters.

  • Base85 characters range from ! to u (ASCII 33 - 117) and z (ASCII 122).

  • To encode, you continuously perform division by 85 on the four octets (a 32-bit number), and add 33 to the remainder (after each division) to get the ASCII character for the encoded value. For example, the first application of this process produces the rightmost character in the encoded block.

  • If a set of four octets contains only null bytes, they are encoded as a z instead of !!!!!.

  • If the last block is shorter than four octets, it's padded with null bytes. After encoding, the same number of characters that were added as padding, are removed from the end of the output.

  • The encoded value should be preceded by <~ and followed by ~>.

  • The encoded value should contain no whitespace (for this challenge).


Examples

In: easy
Out: <~ARTY*~>

In: test
Out: <~FCfN8~>

In: code golf
Out: <~@rGmh+D5V/Ac~>

In: Programming Puzzles
Out: <~:i^JeEa`g%Bl7Q+:j%)1Ch7Y~>

The following snippet will encode a given input to Base85.

<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script><script>String.prototype.toAscii85=function(){if(""==this)return"<~~>";for(var r=[],t=0;t<this.length;t+=4){for(var i=(this.substr(t,4)+"\x00\x00\x00").substr(0,4),o=0,n=0;4>n;n++)o=256*o+i.charCodeAt(n);var s=[];for(n=0;5>n;n++){var e=o%85;o=(o-e)/85,s.unshift(String.fromCharCode(e+33))}r=r.concat(s)}var a=4-this.length%4;return 4!=a&&r.splice(-a,a),"<~"+r.join("").replace(/!!!!!/g,"z")+"~>"};</script><style>#in,#out{margin:20px;width:400px;resize:none}</style><input id="in" type="text" value="Base85"><button onclick="$('#out').text($('#in').val().toAscii85())">Submit</button><br><textarea id="out" rows=5 disabled></textarea>

Zach Gates

Posted 2015-09-23T03:11:46.040

Reputation: 6 152

3I'm confused as to why, given that you restrict the input to printable ASCII, you then use byte as a synonym of octet and don't allow 7-bit bytes. – Peter Taylor – 2015-09-23T06:24:17.097

Endianness should be specified. A block [0,1,2,3] is converted to a 32 bit number as 0x0123 or 0x3210? – edc65 – 2015-09-23T07:40:36.607

@edc65 big endian according to the wikipedia link – Level River St – 2015-09-23T09:55:00.280

3@steveverrill thank you. That should be in the challenge text, and not in an external link. At least it's in a comment now – edc65 – 2015-09-23T10:57:34.330

If the input can only contain printable characters, how could it contain four null bytes? – Luis Mendo – 2015-09-23T23:16:31.403

I was giving a guide to Base85 in general. Not every point is necessarily applicable to the challenge. @LuisMendo – Zach Gates – 2015-09-23T23:20:22.203

Thanks. Like Dennis says, that input can't happen in this case – Luis Mendo – 2015-09-23T23:24:39.957

Answers

9

CJam, 43 39 35 bytes

"<~"q4/{:N4Ue]256b85b'!f+}/N,)<"~>"

Try it online in the CJam interpreter.

How it works

"<~"      e# Push that string.
q4/       e# Read all input from STDIN and split it into chunks of length 4.
{         e# For each chunk:
  :N      e#   Save it in N.
  4Ue]    e#   Right-pad it with 0's to a length of 4.
  256b85b e#   Convert from base 256 to base 85.
  '!f+    e#   Add '!' to each base-85 digit.
}/        e#
N,)       e# Push the length of the last unpadded chunk, plus 1.
<         e# Keep that many chars of the last encoded chunk.
"~>"      e# Push that string.

If the input was empty, N,) will apply to the string "<~". Since N initially holds a single character, the output will be correct.

We don't have to deal with z or pad the encoded chunks to length 5, since the input will contain only printable ASCII characters.

Dennis

Posted 2015-09-23T03:11:46.040

Reputation: 196 637

3This solution looks suspiciously like the Base85 version of an ASCII string (cf. last example in question). Wait... – ojdo – 2015-09-23T07:55:39.533

1

@odjo: There are some invalid characters in the CJam code, the closest I got is this CJam interpreter link

– schnaader – 2015-09-23T12:19:14.710

@ojdo because the challenge is just this: a program that can take an input of a single-line string containing any ASCII printable characters,... – edc65 – 2015-09-23T18:41:19.460

5

Python 3, 71 bytes

from base64 import*
print(a85encode(input().encode(),adobe=1).decode())

I've never golfed in Python, so this is probably sub-optimal.

Thanks to @ZachGates for golfing off 3 bytes!

Dennis

Posted 2015-09-23T03:11:46.040

Reputation: 196 637

1You can use input().encode() instead of str.encode(input()) to save 3 bytes. – Zach Gates – 2015-09-23T03:47:40.823

@ZachGates Thanks! All that en-/decoding is still killing me though. – Dennis – 2015-09-23T03:52:54.977

2

Python 2, 193 162 bytes

from struct import*
i=raw_input()
k=4-len(i)%4&3
i+='\0'*k
o=''
while i:
 b,=unpack('>I',i[-4:]);i=i[:-4]
 while b:o+=chr(b%85+33);b/=85
print'<~%s~>'%o[k:][::-1]

This is my first code golf, so I'm sure there's something wrong with my approach. I also wanted to actually implement base85 rather than just call the library function. :)

David

Posted 2015-09-23T03:11:46.040

Reputation: 121

This is 181 bytes. Don't forget to remove the newline that IDLE adds to your code when you save (if you're using IDLE). You also never call the function, or get the user's input, so it doesn't do anything when you run it. – Zach Gates – 2015-09-23T04:10:33.193

Wasn't sure if it should be a function or read I/O or what... should it read stdin and print stdout? (Again, never done code golf before...) – David – 2015-09-23T04:29:26.313

Welcome to Programming Puzzles & Code Golf! There seems to be a problem with input lengths that are not divisible by 4 (last 2 test cases). Line 3 should read [:4+len(s)/4*4] and no characters are removed from the end of the output. – Dennis – 2015-09-23T04:31:20.473

I believe I've fixed the issues (and unfortunately made it longer). Trying to optimize more... – David – 2015-09-23T04:57:15.510

You can turn your second while loop into one like like this: while b:d=chr(b%85+33)+d;b/=85. You can also remove the space between your print statement and the string. Additionally, remove the space between the arguments passed to s.unpack. – Zach Gates – 2015-09-23T04:57:51.173

2

Octave, 133 131 bytes

Thanks to @ojdo for suggesting I take input from argv rather than stdin, saving me 2 bytes.

function g(s) p=mod(-numel(s),4);s(end+1:end+p)=0;disp(['<~' dec2base(swapbytes(typecast(s,'uint32')),'!':'u')'(:)'(1:end-p) '~>'])

Ungolfed:

function g(s)             %// function header
p=mod(-numel(s),4);       %// number of missing chars until next multiple of 4
s(end+1:end+p)=0;         %// append p null characters to s
t=typecast(s,'uint32');   %// cast each 4 char block to uint32
u=swapbytes(t);           %// change endian-ness of uint32's
v=dec2base(u,'!':'u');    %// convert to base85
w=v'(:)'(1:end-p);        %// flatten and truncate resulting string
disp(['<~' w '~>']);      %// format and display final result

I've posted the code on ideone. The standalone function doesn't require and end statement, but because ideone has the function and the calling script in the same file it requires a separator.

I still haven't been able to figure out how to get stdin to work on ideone. If anyone knows, I'm still interested, so please drop me a comment.

Sample output from ideone:

easy
<~ARTY*~>
test
<~FCfN8~>
code golf
<~@rGmh+D5V/Ac~>
Programming Puzzles
<~:i^JeEa`g%Bl7Q+:j%)1Ch7Y~>

beaker

Posted 2015-09-23T03:11:46.040

Reputation: 2 349

Why not just use argv()? The task description does not seem to require reading input from stdin. – ojdo – 2015-09-24T07:33:36.903

Very nice! So does dec2base in Octave allow bases above 36? – Luis Mendo – 2015-09-24T08:53:54.890

As the doc (and the error message) say: argument BASE must be a number between 2 and 36, or a string of symbols. Here, the expression 'i':'u' expands the the 85 character string !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstu that serves as the base. – ojdo – 2015-09-24T11:01:31.843

@ojdo If that's the case then I should make it a function and maybe save a couple of bytes. – beaker – 2015-09-24T14:08:13.947

@LuisMendo As ojdo says, dec2base will take a base of arbitrary size as long as you give it an alphabet to pull from. For bases b less than or equal to 36, it defaults to the first b characters of [0..9,A..Z]. Any base over 36 requires that you give it the full alphabet (which you can optionally do for smaller bases as well). – beaker – 2015-09-24T14:16:15.380

@beaker In Matlab you can't choose the "digits", only the base; and it's limited to 36 maximum :-( That's why I used bsxfun – Luis Mendo – 2015-09-24T21:29:56.980

@LuisMendo Wow, that's interesting. I didn't even look at the Matlab documentation. That kinda stinks. – beaker – 2015-09-24T21:34:06.997

1@beaker It does. Not only the limitation to 36, but the fact that digits are necessarily 0...9ABC, so there's a jump in ASCII codes – Luis Mendo – 2015-09-24T21:38:38.833

1

PHP, 181 Bytes

foreach(str_split(bin2hex($argn),8)as$v){for($t="",$d=hexdec(str_pad($v,8,0));$d;$d=$d/85^0)$t=chr($d%85+33).$t;$r.=str_replace("!!!!!",z,substr($t,0,1+strlen($v)/2));}echo"<~$r~>";

Online Version

Expanded

foreach(str_split(bin2hex($argn),8)as$v){
    for($t="",$d=hexdec(str_pad($v,8,0));$d;$d=$d/85^0)
      $t=chr($d%85+33).$t;
    $r.=str_replace("!!!!!",z,substr($t,0,1+strlen($v)/2));
}
echo"<~$r~>";

Jörg Hülsermann

Posted 2015-09-23T03:11:46.040

Reputation: 13 026

1

Pure bash, ~738

Encoder first (something golfed):

#!/bin/bash
# Ascii 85 encoder bash script
LANG=C

printf -v n \\%o {32..126};printf -v n "$n";printf -v m %-20sE abtnvfr;p=\<~;l()
{ q=$(($1<<24|$2<<16|$3<<8|$4));q="${n:1+(q/64#378iN)%85:1}${n:1+(q/614125)%85:1
}${n:1+(q/7225)%85:1}${n:1+(q/85)%85:1}${n:1+q%85:1}";};k() { ((${#p}>74))&&ech\
o "${p:0:75}" && p=${p:75};};while IFS= read -rd '' -n 1 q;do [ "$q" ]&&{ print\
f -v q "%q" "$q";case ${#q} in 1|2)q=${n%$q*};o+=($((${#q}+32)));;7)q=${q#*\'\\}
o+=($((8#${q%\'})));;5)q=${q#*\'\\};q=${m%${q%\'}*};o+=($((${#q}+07)));;esac;}||
o+=(0);((${#o[@]}>3))&&{ [ "${o[*]}" = "0 0 0 0" ]&& q=z|| l ${o[@]};p+="${q}";k
o=(); };done;[ "$o" ]&&{ f=0;for((;${#o[@]}<4;)){ o+=(0);((f++));};((f==0))&&[ \
"${o[*]}" = "0 0 0 0" ]&&q=z||l ${o[@]};p+="${q:0:5-f}";};p+="~>";k;[ "$p" ]&&e\
cho "$p"

Tests:

for word in easy test code\ golf Programming\ Puzzles ;do
    printf "%-24s" "$word:"
    ./enc85.sh < <(printf "$word")
  done
easy:                   <~ARTY*~>
test:                   <~FCfN8~>
code golf:              <~@rGmh+D5V/Ac~>
Programming Puzzles:    <~:i^JeEa`g%Bl7Q+:j%)1Ch7Y~>

and decoder now:

#!/bin/bash
# Ascii 85 decoder bash script
LANG=C

printf -v n "\%o" {33..117};printf -v n "$n";o=1 k=1;j(){ read -r q||o=;[ "$q" \
]&&[ -z "${q//*<~*}" ]&&((k))&&k= q="${q#*<~}";m+="$q";m="${m%~>*}";};l(){ r=;f\
or((i=0;i<${#1};i++)){ s="${1:i:1}";case "$s" in "*"|\\|\?)s=\\${s};;esac;s="${\
n%${s}*}";((r+=${#s}*(85**(4-i))));};printf -v p "\%03o" $((r>>24)) $((r>>16&255
)) $((r>>8&255)) $((r&255));};for((;(o+${#m})>0;)){ [ "$m" ] || j;while [ "${m:0
:1}" = "z" ];do m=${m:1};printf "\0\0\0\0";done;if [ ${#m} -ge 5 ];then q="${m:0
:5}";m=${m:5};l "$q";printf "$p";elif ((o));then j;elif [ "${m##z*}" ];then pri\
ntf -v t %$((5-${#m}))s;l "$m${t// /u}";printf "${p:0:16-4*${#t}}";m=;fi;}

Copy this in enc85.sh and dec85.sh, chmod +x {enc,dec}85.sh, then:

./enc85.sh <<<'Hello world!'
<~87cURD]j7BEbo80$3~>
./dec85.sh <<<'<~87cURD]j7BEbo80$3~>'
Hello world!

But you could do some stronger test:

ls -ltr --color $HOME/* | gzip | ./enc85.sh | ./dec85.sh | gunzip

Reduced to 724 chars:

printf -v n \\%o {32..126};printf -v n "$n";printf -v m %-20sE abtnvfr;p=\<~
l(){ q=$(($1<<24|$2<<16|$3<<8|$4))
q="${n:1+(q/64#378iN)%85:1}${n:1+(q/614125)%85:1}${n:1+(q/7225)%85:1}${n:1+(q/85)%85:1}${n:1+q%85:1}"
};k() { ((${#p}>74))&&echo "${p:0:75}" && p=${p:75};};while IFS= read -rd '' -n 1 q;do [ "$q" ]&&{
printf -v q "%q" "$q";case ${#q} in 1|2)q=${n%$q*};o+=($((${#q}+32)));;7)q=${q#*\'\\}
o+=($((8#${q%\'})));;5)q=${q#*\'\\};q=${m%${q%\'}*};o+=($((${#q}+07)));;esac;}||o+=(0)
((${#o[@]}>3))&&{ [ "${o[*]}" = "0 0 0 0" ]&&q=z||l ${o[@]};p+="${q}";k
o=();};done;[ "$o" ]&&{ f=0;for((;${#o[@]}<4;)){ o+=(0);((f++));}
((f==0))&&[ "${o[*]}" = "0 0 0 0" ]&&q=z||l ${o[@]};p+="${q:0:5-f}";};p+="~>";k;[ "$p" ]&&echo "$p"

F. Hauri

Posted 2015-09-23T03:11:46.040

Reputation: 2 654

1

Matlab, 175 bytes

s=input('','s');m=3-mod(numel(s)-1,4);s=reshape([s zeros(1,m)]',4,[])';t=char(mod(floor(bsxfun(@rdivide,s*256.^[3:-1:0]',85.^[4:-1:0])),85)+33)';t=t(:)';['<~' t(1:end-m) '~>']

Example:

>> s=input('','s');m=3-mod(numel(s)-1,4);s=reshape([s zeros(1,m)]',4,[])';t=char(mod(floor(bsxfun(@rdivide,s*256.^[3:-1:0]',85.^[4:-1:0])),85)+33)';t=t(:)';['<~' t(1:end-m) '~>']
code golf
ans =
<~@rGmh+D5V/Ac~>

Luis Mendo

Posted 2015-09-23T03:11:46.040

Reputation: 87 464