Convert a bytes array to base64

10

2

Your mission is to write a function/program that converts an array of bytes (i.e: an array of integers from 0 to 255), to base64.

Using built-in base64 encoders is not allowed.

The required base64 implementation is RFC 2045. (using "+", "/", and mandatory padding with "=")

Shortest code (in bytes) wins!

Example:

Input (int array): [99, 97, 102, 195, 169]

Output (string): Y2Fmw6k=

xem

Posted 2014-05-04T21:53:12.447

Reputation: 5 523

What type of competition is this? – Cilan – 2014-05-04T22:26:53.207

Does built-in base64 encoders cover only binary-to-text encoders or functions manipulating integers as well? – Dennis – 2014-05-05T04:11:35.240

1To clarify: Can I use a function that returns 1 2 for the argument 66? – Dennis – 2014-05-05T05:21:52.803

1

There are 9 standardised or 4 non-standardised versions of base64. Your reference to = for padding narrows it down to 4. Which one do you want? Or do you want a non-standard variant which doesn't have maximum line lengths?

– Peter Taylor – 2014-05-05T06:43:47.273

I'm guessing he/she referred to the either the "standard" one specified by RFC 4648 or the version used by MIME-types, RFC 2045. These are different, so clarification would be very useful. – semi-extrinsic – 2014-05-05T07:38:31.227

sorry for the lack of precision, I didn't know there were different kinds of base64. The one I'm looking for is the one used in dataURI's (so, yes, RFC 2045) – xem – 2014-05-05T08:16:05.577

arg, please don't change requirements when you already have a significant number of answers, and 80% of them will require nontrivial code changes to acommodate the new spec. – skibrianski – 2014-05-05T12:18:35.733

RFC 2045 is not what's used in a data URI. When was the last time you saw a data uri with \r\n in it? Also, in addition to wrapping at 76 chars, there is special newline handlng in RFC 2045. – skibrianski – 2014-05-05T12:57:39.630

@skibrianski damn, I'm really sorry. According to http://en.wikipedia.org/wiki/Base64, RFC2045 is the one corresponding to "Base64 transfer encoding for MIME". But you're right, URIs don't have \r\n. So... I don't know. Which one is used in URIs / JavaScript btoa() ? RFC 1642?

– xem – 2014-05-05T18:12:41.867

Answers

3

JavaScript, 177 187 198 characters

function(d){c="";for(a=e=b=0;a<4*d.length/3;f=b>>2*(++a&3)&63,c+=String.fromCharCode(f+71-(f<26?6:f<52?0:f<62?75:f^63?90:87)))a&3^3&&(b=b<<8^d[e++]);for(;a++&3;)c+="=";return c}

For adding linebreaks, \r\n, after each 76th character, add 23 characters to the code:

function(d){c="";for(a=e=b=0;a<4*d.length/3;f=b>>2*(++a&3)&63,c+=String.fromCharCode(f+71-(f<26?6:f<52?0:f<62?75:f^63?90:87))+(75==(a-1)%76?"\r\n":""))a&3^3&&(b=b<<8^d[e++]);for(;a++&3;)c+="=";return c}

Demo code:

var encode = function(d,a,e,b,c,f){c="";for(a=e=b=0;a<4*d.length/3;f=b>>2*(++a&3)&63,c+=String.fromCharCode(f+71-(f<26?6:f<52?0:f<62?75:f^63?90:87))+(75==(a-1)%76?"\r\n":""))a&3^3&&(b=b<<8^d[e++]);for(;a++&3;)c+="=";return c};

//OP test case
console.log(encode([99, 97, 102, 195, 169])); // outputs "Y2Fmw6k=".

//Quote from Hobbes' Leviathan:
console.log(
 encode(
  ("Man is distinguished, not only by his reason, but by this singular passion from " +
   "other animals, which is a lust of the mind, that by a perseverance of delight " +
   "in the continued and indefatigable generation of knowledge, exceeds the short " +
   "vehemence of any carnal pleasure.")
  .split('').map(function(i){return i.charCodeAt(0)})
 )
);

Tomas Langkaas

Posted 2014-05-04T21:53:12.447

Reputation: 324

Nice solution! You can shave off some bytes using some ES6 features and removing some duplication: Shortened code with comments

– Craig Ayre – 2017-06-11T14:02:32.403

@CraigAyre, thanks for constructive input. ES6 was not finalized and available at the time this challenge was originally posted. As suggested at codegolf.meta, you could post the shortened ES6 version and mark it as non-competing.

– Tomas Langkaas – 2017-06-11T21:35:56.363

No worries, my fault for not double checking the original post date! I'm a fan of your solution so I'm not going to post another, but thanks for the link. The template literal logic that removed the alphabet duplication can be converted to ES5 in the same number of bytes, doesn't save many but every little counts! – Craig Ayre – 2017-06-11T21:56:57.563

@CraigAyre, thanks again for the tip, found another way to compress the base64 symbols even more (which made it even more backwards compatible--should now work in old IE as well). – Tomas Langkaas – 2017-06-11T23:39:21.043

3

32-bit x86 assembly, 59 bytes

Byte-code:

66 B8 0D 0A 66 AB 6A 14 5A 4A 74 F4 AD 4E 45 0F C8 6A 04 59 C1 C0 06 24 3F 3C 3E 72 05 C0
E0 02 2C 0E 2C 04 3C 30 7D 08 04 45 3C 5A 76 02 04 06 AA 4D E0 E0 75 D3 B0 3D F3 AA C3

Disassembly:

b64_newline:
    mov     ax, 0a0dh
    stosw
b64encode:
    push    (76 shr 2) + 1
    pop     edx
b64_outer:
    dec     edx
    je      b64_newline
    lodsd
    dec     esi
    inc     ebp
    bswap   eax
    push    4
    pop     ecx
b64_inner:
    rol     eax, 6
    and     al, 3fh
    cmp     al, 3eh
    jb      b64_testchar
    shl     al, 2     ;'+' and '/' differ by only 1 bit
    sub     al, ((3eh shl 2) + 'A' - '+') and 0ffh
b64_testchar:
    sub     al, 4
    cmp     al, '0'
    jnl     b64_store ;l not b because '/' is still < 0 here
    add     al, 'A' + 4
    cmp     al, 'Z'
    jbe     b64_store
    add     al, 'a' - 'Z' - 1
b64_store:
    stosb
    dec     ebp
    loopne  b64_inner
    jne     b64_outer
    mov     al, '='
    rep     stosb
    ret

Call b64encode with esi pointing to input buffer, edi pointing to output buffer.

It could be made even smaller if line-wrapping is not used.

peter ferrie

Posted 2014-05-04T21:53:12.447

Reputation: 804

1

PHP, 200 bytes

<?foreach($g=$_GET as$k=>$v)$b[$k/3^0]+=256**(2-$k%3)*$v;for(;$i<62;)$s.=chr($i%26+[65,97,48][$i++/26]);foreach($b as$k=>$v)for($i=4;$i--;$p++)$r.=("$s+/=")[count($g)*4/3<$p?64:($v/64**$i)%64];echo$r;

Try it online!

You could replace the string ("$s+/=") with an array array_merge(range(A,Z),range(a,z),range(0,9),["+","/","="])

Only to compare which byte count can reach with an not allowed built-in

PHP, 45 bytes

<?=base64_encode(join(array_map(chr,$_GET)));

Try it online!

Jörg Hülsermann

Posted 2014-05-04T21:53:12.447

Reputation: 13 026

1

perl, 126 bytes

reads stdin, outputs to stdout

$/=$\;print map{$l=y///c/2%3;[A..Z,a..z,0..9,"+","/"]->[oct"0b".substr$_.0 x4,0,6],$l?"="x(3-$l):""}unpack("B*",<>)=~/.{1,6}/g

ungolfed:

my @x = ('A'..'Z','a'..'z',0..9,'+','/');
my $in = join '', <>;
my $bits = unpack 'B*', $in;
my @six_bit_groups = $bits =~ /.{1,6}/g;
for my $sixbits (@six_bit_groups) {
  next unless defined $sixbits;
  $l=length($sixbits)/2%3;
  my $zero_padded = $sixbits . ( "0" x 4 );
  my $padded_bits = substr( $zero_padded, 0, 6 );
  my $six_bit_int = oct "0b" . $padded_bits;
  print $x[$six_bit_int];
  print "=" x (3 - $l)  if  $l;
}

skibrianski

Posted 2014-05-04T21:53:12.447

Reputation: 1 197

The question has been clarified to require RFC 2045, so you need to add a bit of code to split the output into 76-char chunks and join with \r\n. – Peter Taylor – 2014-05-05T09:34:44.317

1

Python, 234 chars

def F(s):
 R=range;A=R(65,91)+R(97,123)+R(48,58)+[43,47];n=len(s);s+=[0,0];r='';i=0
 while i<n:
  if i%57<1:r+='\r\n'
  for j in R(4):r+=chr(A[s[i]*65536+s[i+1]*256+s[i+2]>>18-6*j&63])
  i+=3
 k=-n%3
 if k:r=r[:-k]+'='*k
 return r[2:]

Keith Randall

Posted 2014-05-04T21:53:12.447

Reputation: 19 865

The question has been clarified to require RFC 2045, so you need to add a bit of code to split the output into 76-char chunks and join with \r\n. – Peter Taylor – 2014-05-05T09:38:01.670

@PeterTaylor: fixed. – Keith Randall – 2014-05-05T15:27:35.817

1

Perl, 147 bytes

sub b{$f=(3-($#_+1)%3)%3;$_=unpack'B*',pack'C*',@_;@r=map{(A..Z,a..z,0..9,'+','/')[oct"0b$_"]}/.{1,6}/g;$"='';join"\r\n",("@r".'='x$f)=~/.{1,76}/g}

The function takes a list of integers as input and outputs the string, base64 encoded.

Example:

print b(99, 97, 102, 195, 169)

prints

Y2Fmw6kA

Ungolfed:

Version that also visualizes the intermediate steps:

sub b {
    # input array: @_
    # number of elements: $#_ + 1 ($#_ is zero-based index of last element in 
    $fillbytes = (3 - ($#_ + 1) % 3) % 3;
      # calculate the number for the needed fill bytes
      print "fillbytes:       $fillbytes\n";
    $byte_string = pack 'C*', @_;
      # the numbers are packed as octets to a binary string
      # (binary string not printed)
    $bit_string = unpack 'B*', $byte_string;
      # the binary string is converted to its bit representation, a string wit
      print "bit string:      \"$bit_string\"\n";
    @six_bit_strings = $bit_string =~ /.{1,6}/g;
      # group in blocks of 6 bit
      print "6-bit strings:   [@six_bit_strings]\n";
    @index_positions = map { oct"0b$_" } @six_bit_strings;
      # convert bit string to number
      print "index positions: [@index_positions]\n";
    @alphabet = (A..Z,a..z,0..9,'+','/');
      # the alphabet for base64
    @output_chars = map { $alphabet[$_] } @index_positions;
      # output characters with wrong last characters that entirely derived fro
      print "output chars:    [@output_chars]\n";
    local $" = ''; #"
    $output_string = "@output_chars";
      # array to string without space between elements ($")
      print "output string:   \"$output_string\"\n";
    $result = $output_string .= '=' x $fillbytes;
      # add padding with trailing '=' characters
      print "result:          \"$result\"\n";
    $formatted_result = join "\r\n", $result =~ /.{1,76}/g;
      # maximum line length is 76 and line ends are "\r\n" according to RFC 2045
      print "formatted result:\n$formatted_result\n";
    return $formatted_result;
}

Output:

fillbytes:       1
bit string:      "0110001101100001011001101100001110101001"
6-bit strings:   [011000 110110 000101 100110 110000 111010 1001]
index positions: [24 54 5 38 48 58 9]
output chars:    [Y 2 F m w 6 J]
output string:   "Y2Fmw6J"
result:          "Y2Fmw6J="
formatted result:
Y2Fmw6J=

Tests:

The test strings come from the example in the question the examples in the Wikipedia article for Base64.

sub b{$f=(3-($#_+1)%3)%3;$_=unpack'B*',pack'C*',@_;@r=map{(A..Z,a..z,0..9,'+','/')[oct"0b$_"]}/.{1,6}/g;$"='';join"\r\n",("@r".'='x$f)=~/.{1,76}/g}

sub test ($) {
   print b(map {ord($_)} $_[0] =~ /./sg), "\n\n";
}

my $str = <<'END_STR';
Man is distinguished, not only by his reason, but by this singular passion from
other animals, which is a lust of the mind, that by a perseverance of delight
in the continued and indefatigable generation of knowledge, exceeds the short
vehemence of any carnal pleasure.
END_STR
chomp $str;

test "\143\141\146\303\251";
test $str;
test "any carnal pleasure.";
test "any carnal pleasure";
test "any carnal pleasur";
test "any carnal pleasu";
test "any carnal pleas";
test "pleasure.";
test "leasure.";
test "easure.";
test "asure.";
test "sure.";

Test output:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbQpvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodAppbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydAp2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZSO=

YW55IGNhcm5hbCBwbGVhc3VyZSO=

YW55IGNhcm5hbCBwbGVhc3VyZB==

YW55IGNhcm5hbCBwbGVhc3Vy

YW55IGNhcm5hbCBwbGVhc3F=

YW55IGNhcm5hbCBwbGVhcD==

cGxlYXN1cmUu

bGVhc3VyZSO=

ZWFzdXJlLC==

YXN1cmUu

c3VyZSO=

Heiko Oberdiek

Posted 2014-05-04T21:53:12.447

Reputation: 3 841

The question has been clarified to require RFC 2045, so you need to add a bit of code to split the output into 76-char chunks and join with \r\n. – Peter Taylor – 2014-05-05T09:37:26.107

@PeterTaylor: Thanks, I have updated the answer for RFC 2045. – Heiko Oberdiek – 2014-05-05T11:17:40.257

bravo for this very complete answer. Including mandatory line-breaks (by specifying "RFC 2045" in the OP) was actually an error, you can in fact ignore that part. Sorry :) – xem – 2014-05-05T18:15:31.793

1

GolfScript, 80 (77) bytes

~.,~)3%:P[0]*+[4]3*\+256base 64base{'+/''A[a{:0'{,^}/=}/{;}P*'='P*]4>76/"\r
":n*

The above will fit exactly 76 characters in a line, except for the last line. All lines are terminated by CRLF.

Note that RFC 2045 specifies a variable, maximum line length of 76 characters, so at the cost of pretty output, we can save 3 additional bytes.

~.,~)3%:P[0]*+[4]3*\+256base 64base{'+/''A[a{:0'{,^}/=}/{;}P*'='P*]4>{13]n+}/

The above will print one character per line, except for the last line, which can contain 0, 1 or 2 = chars. GolfScript will also append a final LF, which, according to RFC 2045, must be ignored by decoding software.

Example

$ echo '[99 97 102 195 169]' | golfscript base64.gs | cat -A
Y2Fmw6k=^M$
$ echo [ {0..142} ] | golfscript base64.gs | cat -A
AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh8gISIjJCUmJygpKissLS4vMDEyMzQ1Njc4^M$
OTo7PD0+P0BBQkNERUZHSElKS0xNTk9QUVJTVFVWV1hZWltcXV5fYGFiY2RlZmdoaWprbG1ub3Bx^M$
cnN0dXZ3eHl6e3x9fn+AgYKDhIWGh4iJiouMjY4=^M$
$ echo '[99 97 102 195 169]' | golfscript base64-sneaky.gs | cat -A
Y^M$
2^M$
F^M$
m^M$
w^M$
6^M$
k^M$
=^M$
$

How it works

~          # Interpret the input string.
.,~)3%:P   # Calculate the number of bytes missing to yield a multiple of 3 and save in “P”.
[0]*+      # Append that many zero bytes to the input array.
[4]3*\+    # Prepend 3 bytes to the input array to avoid issues with leading zeros.
256base    # Convert the input array into an integer.
64base     # Convert that integer to base 64.
{          # For each digit:
  '+/'     # Push '+/'.
  'A[a{:0' # Push 'A[a{:0'.
  {        # For each byte in 'A[a{:0':
    ,      # Push the array of all bytes up to that byte.
    ^      # Take the symmetric difference with the array below it.
  }/       # Result: 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
  =        # Retrieve the character corresponding to the digit.
}/         #
{;}P*'='P* # Replace the last “P” characters with a string containing that many “=” chars.
]          # Collect all bytes on the stack into an array.
4>         # Remove the first four, which correspond to the 3 prepended bytes.
76/        # Collect all bytes on the stack into an array and split into 76-byte chunks.
"\r\n":n*  # Join the chunks with separator CRLF and save CRLF as the new line terminator.

Dennis

Posted 2014-05-04T21:53:12.447

Reputation: 196 637

0

Jelly, 38 bytes

s3z0Zµḅ⁹b64‘ịØb)FṖ³LN%3¤¡s4z”=Z;€“ƽ‘Ọ

Try it online!

As (almost) every other answer covers the RFC2045 requirement of "at most 76 chars per line with line ending \r\n", I followed it.

How it works

s3z0Zµḅ⁹b64‘ịØb)FṖ³LN%3¤¡s4z”=Z;€“ƽ‘Ọ    Monadic main link. Input: list of bytes

s3z0Z    Slice into 3-item chunks, transpose with 0 padding, transpose back
         Equivalent to "pad to length 3n, then slice into chunks"

µḅ⁹b64‘ịØb)    Convert each chunk to base64
 ḅ⁹b64         Convert base 256 to integer, then to base 64
      ‘ịØb     Increment (Jelly is 1-based) and index into base64 digits

FṖ³LN%3¤¡s4z”=Z    Add correct "=" padding
F                  Flatten the list of strings to single string
 Ṗ      ¡          Repeat "remove last" n times, where
  ³LN%3¤             n = (- input length) % 3
         s4z”=Z    Pad "=" to length 4n, then slice into 4-item chunks

;€“ƽ‘Ọ    Add "\r\n" line separator
;€         Append to each line:
  “ƽ‘       Codepage-encoded list [13,10]
      Ọ    Apply `chr` to numbers; effectively add "\r\n"

Bubbler

Posted 2014-05-04T21:53:12.447

Reputation: 16 616

Base decompression can be used here, but ṃØbṙ1¤ is a bit too long for a simple operation. – user202729 – 2018-11-11T04:14:40.477

It may be worth asking Dennis to make rotated-base-decompression atom. – user202729 – 2018-11-11T04:18:43.363

Fails for 0,0,0. – user202729 – 2018-11-11T12:41:16.650

0

Python - 310, 333

def e(b):
  l=len;c="ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";r=p="";d=l(b)%3
  if d>0:d=abs(d-3);p+="="*d;b+=[0]*d
  for i in range(0,l(b)-1,3):
    if l(r)%76==0:r+="\r\n"
    n=(b[i]<<16)+(b[i+1]<<8)+b[i+2];x=(n>>18)&63,(n>>12)&63,(n>>6)&63,n&63;r+=c[x[0]]+c[x[1]]+c[x[2]]+c[x[3]]
  return r[:l(r)-l(p)]+p

Somewhat ungolfed:

def e( b ):
    c = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
    r = p = ""
    d = len( b ) % 3

    if d > 0:
        d = abs( d - 3 )
        p = "=" * d
        b + = [0] * d

    for i in range( 0, len( b ) - 1, 3 ):
        if len( r ) % 76 == 0:
            r += "\r\n"

        n = ( b[i] << 16 ) + ( b[i + 1] << 8 ) + b[i + 2]
        x = ( n >> 18 ) & 63, ( n >> 12 ) & 63, ( n >> 6) & 63, n & 63
        r += c[x[0]] + c[x[1]] + c[x[2]] + c[x[3]]

    return r[:len( r ) - len( p )] + p

Example:

Python's built-in base64 module is only used in this example to ensure the e function has the correct output, the e function itself isn't using it.

from base64 import encodestring as enc

test = [ 99, 97, 102, 195, 169 ]
str  = "".join( chr( x ) for x in test )

control = enc( str ).strip()
output = e( test )

print output            # => Y2Fmw6k=
print control == output # => True

Tony Ellis

Posted 2014-05-04T21:53:12.447

Reputation: 1 706

The question has been clarified to require RFC 2045, so you need to add a bit of code to split the output into 76-char chunks and join with \r\n. – Peter Taylor – 2014-05-05T09:38:18.177

@PeterTaylor fixed. – Tony Ellis – 2014-05-05T18:21:43.987

0

JavaScript (ES6), 220B

f=a=>{for(s=a.map(e=>('0000000'+e.toString(2)).slice(-8)).join(p='');s.length%6;p+='=')s+='00';return s.match(/.{6}/g).map(e=>'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'[parseInt(e,2)]).join('')+p}

If your browser does not support ES6, you can try with this version (262B) :

function f(a){for(s=a.map(function(e){return ('0000000'+e.toString(2)).slice(-8)}).join(p='');s.length%6;p+='=')s+='00';return s.match(/.{6}/g).map(function(e){return 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'[parseInt(e,2)]}).join('')+p}

f([99, 97, 102, 195, 169]) returns "Y2Fmw6k=".

Michael M.

Posted 2014-05-04T21:53:12.447

Reputation: 12 173

Where's the code to split it into 76-char chunks joined with \r\n? – Peter Taylor – 2014-05-05T15:33:28.463