Base Conversion With Strings

16

5

Introduction

We've have a few base conversion challenges here in the past, but not many designed to tackle arbitrary length numbers (that is to say, numbers that are long enough that they overflow the integer datatype), and of those, most felt a little complicated. I'm curious how golfed down a change of base code like this can get.

Challenge

Write a program or function in the language of your choice that can convert a string of one base to a string of another base. Input should be the number to be converted (string), from-base (base-10 number), to-base (base-10 number), and the character set (string). Output should be the converted number (string).

Some further details and rules are as follows:

  • The number to be converted will be a non-negative integer (since - and . may be in the character set). So too will be the output.
  • Leading zeroes (the first character in the character set) should be trimmed. If the result is zero, a single zero digit should remain.
  • The minimum supported base range is from 2 to 95, consisting of the printable ascii characters.
  • The input for the number to be converted, the character set, and the output must all be of the string datatype. The bases must be of the base-10 integer datatype (or integer floats).
  • The length of the input number string can be very large. It's hard to quantify a sensible minimum, but expect it to be able to handle at least 1000 characters, and complete 100 characters input in less than 10 seconds on a decent machine (very generous for this sort of problem, but I don't want speed to be the focus).
  • You cannot use built in change-of-base functions.
  • The character set input can be in any arrangement, not just the typical 0-9a-z...etc.
  • Assume that only valid input will be used. Don't worry about error handling.

The winner will be determined by the shortest code that accomplishes the criteria. They will be selected in at least 7 base-10 days, or if/when there have been enough submissions. In the event of a tie, the code that runs faster will be the winner. If close enough in speed/performance, the answer that came earlier wins.

Examples

Here's a few examples of input and output that your code should be able to handle:

F("1010101", 2, 10, "0123456789")
> 85

F("0001010101", 2, 10, "0123456789")
> 85

F("85", 10, 2, "0123456789")
> 1010101

F("1010101", 10, 2, "0123456789")
> 11110110100110110101

F("bababab", 2, 10, "abcdefghij")
> if

F("10", 3, 2, "0123456789")
> 11

F("<('.'<)(v'.'v)(>'.'>)(^'.'^)", 31, 2, "~!@#$%^v&*()_+-=`[]{}|';:,./<>? ")
> !!~~~~~~~!!!~!~~!!!!!!!!!~~!!~!!!!!!~~!~!~!!!~!~!~!!~~!!!~!~~!!~!!~~!~!!~~!!~!~!!!~~~~!!!!!!!!!!!!~!!~!~!~~~~!~~~~!~~~~~!~~!!~~~!~!~!!!~!~~

F("~~~~~~~~~~", 31, 2, "~!@#$%^v&*()_+-=`[]{}|';:,./<>? ")
> ~

F("9876543210123456789", 10, 36, "0123456789abcdefghijklmnopqrstuvwxyz")
> 231ceddo6msr9

F("ALLYOURBASEAREBELONGTOUS", 62, 10, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ")
> 6173180047113843154028210391227718305282902

F("howmuchwoodcouldawoodchuckchuckifawoodchuckcouldchuckwood", 36, 95, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()_-+=[{]}\\|;:'\",<.>/? ")
> o3K9e(r_lgal0$;?w0[`<$n~</SUk(r#9W@."0&}_2?[n

F("1100111100011010101010101011001111011010101101001111101000000001010010100101111110000010001001111100000001011000000001001101110101", 2, 95, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()_-+=[{]}\\|;:'\",<.>/? ")
> this is much shorter

Mwr247

Posted 2016-01-11T17:46:03.923

Reputation: 3 494

We have had one designed to tackle arbitrary length numbers. – Peter Taylor – 2016-01-11T18:50:57.103

@PeterTaylor Well dang, somehow missed that one in my search. Still, I would argue they are different enough. The other one involves a default character set, multi-byte sequences, error handling, and sequence-to-sequence conversion. All these add to much larger bloat in the answers, and focus on different optimizations. This challenge is much more trimmed down, and will result in completely different code from the other challenge (short of the core algorithm). – Mwr247 – 2016-01-11T19:04:42.873

@PeterTaylor Plus, the other question was asked 4 years ago and received only two valid answers (and with one already accepted, little reason to bump). I'm willing to bet the community would enjoy this challenge, with little impact from the previous one, or feelings of "repetitiveness". – Mwr247 – 2016-01-11T19:10:02.380

7While this challenge is very similar to the previous one, I'd actually be in favor of closing the previous one as a dupe of this one. This challenge is much clearer and higher quality than the old one. – Mego – 2016-01-11T21:04:21.190

Could you elaborate a bit on You cannot use built in change-of-base functions to convert the entire input string/number at once? Specifically, could I use a built-in to convert the input to a intermediate base? Can I then use a built-in to convert to the target base? Would something like convert input with canonical form for given base; convert to base 10; convert to target base; convert back to specified character set with string replacement? – Mego – 2016-01-11T23:38:33.077

Now that this question has an answer, I have voted to close the older one as a dupe of this one. – Mego – 2016-01-12T03:21:49.327

"7 base-10 days" I guess since this thing uses arbitrary characters as digits, "7" could stand for any number. For instance, if the character set is the ASCII characters from code 46 to code 55, then this challenge will end in 10 days. On the other hand, if using the standard digits, why specify a base? The symbol 7 represents the number 7 in any base that uses it. – quintopia – 2016-01-12T15:21:21.623

@Mego Good point on the base conversion loophole. It'd probably be simpler to just say no built-in base change at all, especially since no answers use it yet. – Mwr247 – 2016-01-12T15:59:56.710

@quintopia It was really more a self reference to a previous challenge I made, from which I copied the post template for this one, as well as a reference to the challenge itself and bases. For that matter, isn't "base-10" indeterminate itself, since every base has a 10? ;)

– Mwr247 – 2016-01-12T16:04:32.477

@Mwr247 clearly you meant "standard-digit 7 days" or "standard-digit 7 base-standard-digit-A days" which is redundant with the former. – quintopia – 2016-01-12T17:48:22.873

There's no such thing as a base-10 integer datatype. Just say input will be in base ten. – lirtosiast – 2016-01-12T23:31:17.747

@ThomasKwa But then someone might try a base-10 string ;) – Mwr247 – 2016-01-12T23:41:14.663

Answers

5

CJam, 34 bytes

0ll:Af#lif{@*+}~li:X;{XmdA=\}h;]W%

Input format is input_N alphabet input_B output_B each on a separate line.

Run all test cases.

Explanation

0     e# Push a zero which we'll use as a running total to build up the input number.
l     e# Read the input number.
l:A   e# Read the alphabet and store it in A.
f#    e# For each character in the input number turn it into its position in the alphabet,
      e# replacing characters with the corresponding numerical digit value.
li    e# Read input and convert to integer.
f{    e# For each digit (leaving the base on the stack)...
  @*  e#   Pull up the running total and multiply it by the base.
  +   e#   Add the current digit.
}
~     e# The result will be wrapped in an array. Unwrap it.
li:X; e# Read the output base, store it in X and discard it.
{     e# While the running total is not zero yet...
  Xmd e#   Take the running total divmod X. The modulo gives the next digit, and
      e#   the division result represents the remaining digits.
  A=  e#   Pick the corresponding character from the alphabet.
  \   e#   Swap the digit with the remaining value.
}h
;     e# We'll end up with a final zero on the stack which we don't want. Discard it.
]W%   e# Wrap everything in an array and reverse it, because we've generated the 
      e# digits from least to most significant.

This works for the same byte count:

L0ll:Af#lif{@*+}~li:X;{XmdA=@+\}h;

The only difference is that we're building up a string instead of collecting everything on the stack and reversing it.

Martin Ender

Posted 2016-01-11T17:46:03.923

Reputation: 184 808

7

Python 2, 115 114 106 105 94 bytes

Golfing suggestions welcome. Try it online!

Edit: -9 bytes thanks to mbomb007. -2 bytes thanks to FlipTack.

def a(n,f,t,d,z=0,s=''):
 for i in n:z=z*f+d.find(i)
 while z:s=d[z%t]+s;z/=t
 print s or d[0]

Ungolfed:

def arbitrary_base_conversion(num, b_from, b_to, digs, z=0, s=''):
    for i in num:
        z = z * b_from + digs.index(i)
    while z:
        s = digs[z % b_to] + s
        z = z / t
    if s:
        return s
    else:
        return d[0]

Sherlock9

Posted 2016-01-11T17:46:03.923

Reputation: 11 664

1while z:s=d[z%t]+s;z/=t saves 9 bytes. – mbomb007 – 2016-09-23T16:04:06.083

You could put z=0 and s='' in the function declaration to save bytes. – FlipTack – 2017-01-30T21:13:42.230

using print instead of return is allowed by default.

– FlipTack – 2017-01-30T21:14:54.397

6

Seriously, 50 bytes

0╗,╝,2┐,3┐,4┐╛`4└í╜2└*+╗`MX╜ε╗W;3└@%4└E╜@+╗3└@\WX╜

Hex Dump:

30bb2cbc2c32bf2c33bf2c34bfbe6034c0a1bd32c02a2bbb60
4d58bdeebb573b33c0402534c045bd402bbb33c0405c5758bd

I'm proud of this one despite its length. Why? Because it worked perfectly on the second try. I wrote it and debugged it in literally 10 minutes. Usually debugging a Seriously program is an hour's labor.

Explanation:

0╗                                                  Put a zero in reg0 (build number here)
  ,╝,2┐,3┐,4┐                                       Put evaluated inputs in next four regs
             ╛                                      Load string from reg1
              `         `M                          Map over its chars
               4└                                   Load string of digits
                 í                                  Get index of char in it.
                  ╜                                 Load number-so-far from reg0
                   2└*                              Multiply by from-base
                      +                             Add current digit.
                       ╗                            Save back in reg0
                          X                         Discard emptied string/list.
                           ╜                        Load completed num from reg0
                            ε╗                      Put empty string in reg0
                              W                W    While number is positive
                               ;                    Duplicate
                                3└@%                Mod by to-base.
                                    4└E             Look up corresponding char in digits
                                       ╜@+          Prepend to string-so-far.
                                                      (Forgetting this @ was my one bug.)
                                          ╗         Put it back in reg0
                                           3└@\     integer divide by to-base.
                                                X   Discard leftover 0
                                                 ╜  Load completed string from reg0
                                                    Implicit output.

quintopia

Posted 2016-01-11T17:46:03.923

Reputation: 3 899

3

Ruby, 113 112 105 98 97 95 87 bytes

I sort of double-posted my Python answer (somehow), so here's a Ruby answer. Seven more bytes thanks to manatwork, another byte thanks to Martin Büttner, and 8 more bytes thanks to cia_rana.

->n,f,t,d{z=0;s='';n.chars{|i|z=z*f+d.index(i)};(s=d[z%t]+s;z/=t)while z>0;s[0]?s:d[0]}

Ungolfed:

def a(n,f,t,d)
  z=0
  s=''
  n.chars do |i|
    z = z*f + d.index(i)
  end
  while z>0 
    s = d[z%t] + s
    z /= t
  end
  if s[0]   # if n not zero
    return s
  else
    return d[0]
  end
end

Sherlock9

Posted 2016-01-11T17:46:03.923

Reputation: 11 664

How about use s=d[z%t]+s;z/=t instead of z,m=z.divmod t;s=d[m]+s? – cia_rana – 2016-09-23T18:51:59.757

3

C (function) with GMP library, 260

This turned out longer than I'd hoped, but here it is anyway. The mpz_* stuff really eats up a lot of bytes. I tried #define M(x) mpz_##x, but that gave a net gain of 10 bytes.

#include <gmp.h>
O(mpz_t N,int t,char*d){mpz_t Q,R;mpz_inits(Q,R,0);mpz_tdiv_qr_ui(Q,R,N,t);mpz_sgn(Q)&&O(Q,t,d);putchar(d[mpz_get_ui(R)]);}F(char*n,int f,int t,char*d){mpz_t N;mpz_init(N);while(*n)mpz_mul_ui(N,N,f),mpz_add_ui(N,N,strchr(d,*n++)-d);O(N,t,d);}

The function F() is the entry-point. It converts the input string to an mpz_t by successive multiplications by the from-base and addition of the index of the given digit in the digit list.

The function O() is a recursive output function. Each recursion divmods the mpz_t by the to-base. Because this yields the output digits in reverse order, the recursion effectively allows the digits to be stored on the stack and output in the correct order.

Test driver:

Newlines and indenting added for readability.

#include <stdio.h>
#include <string.h>

#include <gmp.h>
O(mpz_t N,int t,char*d){
  mpz_t Q,R;
  mpz_inits(Q,R,0);
  mpz_tdiv_qr_ui(Q,R,N,t);
  mpz_sgn(Q)&&O(Q,t,d);
  putchar(d[mpz_get_ui(R)]);
}
F(char*n,int f,int t,char*d){
  mpz_t N;
  mpz_init(N);
  while(*n)
    mpz_mul_ui(N,N,f),mpz_add_ui(N,N,strchr(d,*n++)-d);
  O(N,t,d);
}

int main (int argc, char **argv) {
  int i;

  struct test_t {
    char *n;
    int from_base;
    int to_base;
    char *digit_list;
  } test[] = {
    {"1010101", 2, 10, "0123456789"},
    {"0001010101", 2, 10, "0123456789"},
    {"85", 10, 2, "0123456789"},
    {"1010101", 10, 2, "0123456789"},
    {"bababab", 2, 10, "abcdefghij"},
    {"10", 3, 2, "0123456789"},
    {"<('.'<)(v'.'v)(>'.'>)(^'.'^)", 31, 2, "~!@#$%^v&*()_+-=`[]{}|';:,./<>? "},
    {"~~~~~~~~~~", 31, 2, "~!@#$%^v&*()_+-=`[]{}|';:,./<>? "},
    {"9876543210123456789", 10, 36, "0123456789abcdefghijklmnopqrstuvwxyz"},
    {"ALLYOURBASEAREBELONGTOUS", 62, 10, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"},
    {"howmuchwoodcouldawoodchuckchuckifawoodchuckcouldchuckwood", 36, 95, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()_-+=[{]}\\|;:'\",<.>/? "},
    {"1100111100011010101010101011001111011010101101001111101000000001010010100101111110000010001001111100000001011000000001001101110101", 2, 95, "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%^&*()_-+=[{]}\\|;:'\",<.>/? "},
    {0}
  };

  for (i = 0; test[i].n; i++) {
    F(test[i].n, test[i].from_base, test[i].to_base, test[i].digit_list);
    puts("");
  }

  return 0;
}

Digital Trauma

Posted 2016-01-11T17:46:03.923

Reputation: 64 644

3

JavaScript (ES6), 140 bytes

(s,f,t,m)=>[...s].map(c=>{c=m.indexOf(c);for(i=0;c||i<r.length;i++)r[i]=(n=(r[i]|0)*f+c)%t,c=n/t|0},r=[0])&&r.reverse().map(c=>m[c]).join``

Unlike @Mwr247's code (which uses base-f arithmetic to divide s by t each time, collecting each remainder as he goes) I use base-t arithmetic to multiply the answer by f each time, adding each digit of s as I go.

Ungolfed:

function base(source, from, to, mapping) {
    result = [0];
    for (j = 0; j < s.length; s++) {
        carry = mapping.indexOf(s[j]);
        for (i = 0; carry || i < result.length; i++) {
            next = (result[i] || 0) * from + carry;
            result[i] = next % to;
            carry = Math.floor(next / to);
         }
    }
    string = "";
    for (j = result.length; j --> 0; )
        string += mapping[result[j]];
    return string;
}

Neil

Posted 2016-01-11T17:46:03.923

Reputation: 95 035

3

APL, 10 bytes

{⍺⍺[⍵⍵⍳⍵]}

This is an APL operator. In APL, and are used to pass values, while ⍵⍵ and ⍺⍺ are usually used to pass functions. I'm abusing this here to have 3 arguments. ⍺⍺ is the left argument, ⍵⍵ is the "inner" right argument, and is the "outer" right argument.

Basically: ⍺(⍺⍺{...}⍵⍵)⍵

Then all that's needed is to find the positions of the input string in the "from" table, and then use [] to index into the "to" table with these positions.

Example:

    ('012345'{⍺⍺[⍵⍵⍳⍵]}'abcdef')'abcabc'
012012

Ven

Posted 2016-01-11T17:46:03.923

Reputation: 3 382

2

JavaScript (ES6), 175 bytes

(s,f,t,h)=>eval('s=[...s].map(a=>h.indexOf(a));n=[];while(s.length){d=m=[],s.map(v=>((e=(c=v+m*f)/t|0,m=c%t),e||d.length?d.push(e):0)),s=d,n.unshift(m)}n.map(a=>h[a]).join``')

Figured it's been long enough now that I can submit the one I made to create the examples. I may try and golf it down a bit better later.

Mwr247

Posted 2016-01-11T17:46:03.923

Reputation: 3 494

1

Japt, 9 bytes

nVîX
sWîX

Try it

Shaggy

Posted 2016-01-11T17:46:03.923

Reputation: 24 623