All your base 97 are belong to us

18

1

Many programming languages are written using solely printable ASCII, tabs and newlines. These 97 characters are then stored in 8-bit bytes (which are actually capable of holding 256 different characters!), which is just terribly inefficient - especially in code-golfing, where every byte counts! In this challenge, you will be able to reduce your score by using base conversion.

Challenge

Your program/function takes a string or character array as input, which it then interprets as a base-97 number. It then converts this to a base-256 number, and counts the number of symbols (i.e., bytes) necessary to represent this number. This count will be the output/return value of your program/function.

A simple example using base-2 and base-10 (binary and decimal): if the input is 10110, the output would be 2, since 101102=2210 (two digits necessary to represent output). Similarly, 11012 becomes 1310, giving an output of 2 as well, and 1102 becomes 610, so then the output would be 1.

The input string can contain all 95 printable ASCII characters , as well as newline \n and literal tab \t, which creates a source alphabet of 97 symbols for your base conversion. The exact alphabet will thus be (substituting the \t and \n with actual literal tab and newline; note the literal space following the newline):

\t\n !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Note that the order of this alphabet is important: for example, base-97 \t corresponds to decimal 0, and ! corresponds to decimal 3.

Some testcases: (you do not need to handle an empty string)

Input                             Output
'example@domain.com'                  15
'All your base are belong to us!'     26
'       abcd'                          9
'~      abcd'                         10
'ABCDEFGHIJK'                          9
'zyxwvutsrpq'                         10
'{".~"}.~'                             7
'\t\t\t\t\t\t\t\t'                     1 (with \t a literal tab; the result is 0, which can be represented with 1 byte)
'!\t\t\t\t\t\t\t\t'                    7 (with \t a literal tab)

Scoring

  1. If your entry uses only printable ASCII, newline and/or tab: The score of your program will be the output of your program, when given its own source code as input.

  2. If your entry uses any characters that are not printable ASCII, newline or tab: The score of your program is simply the number of bytes, like in .

Sanchises

Posted 2017-08-15T08:50:03.243

Reputation: 8 530

3

If you have a better title suggestion than this outdated meme, feel free to post it in the comments!

– Sanchises – 2017-08-15T08:51:27.410

Did you realize that this challenge could be won with a lenguage answer consisting of only tabs. – pppery – 2017-08-15T13:48:35.853

@ppperry To be honest, I have very little patience for such answers. Yes, I did realize this, but until someone can actually store the program on their system, it's not going to get my upvote. – Sanchises – 2017-08-15T14:03:13.097

Answers

7

Python 2, score 73 72 71

Edit: -1 thanks to @Jonathan Allan

def f(l,z=0):
	for i in map(ord,l):z+=i-[30,9][i&lt32];z*=97
	print(len(bin(z))-2)/8or 1

Try it online!

Halvard Hummel

Posted 2017-08-15T08:50:03.243

Reputation: 3 131

just one / should be OK I think – Jonathan Allan – 2017-08-15T10:07:43.613

or 1 may be replaced with |1 in this instance. – Jonathan Allan – 2017-08-15T10:19:44.927

1@JonathanAllan That yields different (wrong) results. – Sanchises – 2017-08-15T10:21:39.450

Oh, yeah it will >.< - was thinking only gonna get a zero there but it'll bitwise or with the other numbers too. – Jonathan Allan – 2017-08-15T10:23:38.087

@JonathanAllan Exactly. It'll work for odd results, but it will add one to even results. – Sanchises – 2017-08-15T10:24:17.420

5

Jelly,  18  17 bytes - score  18  17

-1 byte thanks to Erik the Outgolfer (no need for a list of lists for the translation)

O“µœ½þ‘y_30ḅ97b⁹L

Try it online!

How?

O“µœ½þ‘y_30ḅ97b⁹L - Link: list of characters
O                 - convert from characters to ordinals
 “µœ½þ‘           - code-page indices = [9,30,10,31]
       y          - translate (9->30 and 10->31)
        _30       - subtract 30
           ḅ97    - convert from base 97
               ⁹  - literal 256
              b   - convert to base
                L - length of the result

--The best I've got with ASCII only is a score of 29:

O10,31,9,30y_30Ux"J_1 97*$$$SSb256L

- this is extremely inefficient too. It translates the ordinals like above, but the conversion from base 97 is achieved by repeating the values and summing, rather than using direct multiplication - that is to convert {".~"}.~ it gets the adjusted indexes [93,4,16,96,4,95,16,96] then reverses (U) and repeats them to make [[96,96,..., 97⁷ times ...,96],[16,16,... 97⁶ times ...16],[95,95,... 97⁵ times ...95],[4,4,... 97⁴ times ...4],[96,96,... 97³ times ...96],,[16,16,... 97² times ...,16],[4,4,... 97 times ...4],[93]] and then sums, converts to base 256 and gets the length (if it has not run out of memory :p).

Jonathan Allan

Posted 2017-08-15T08:50:03.243

Reputation: 67 804

5

Japt, score 19 (23 bytes)

nHo127 uA9 md)sG l /2 c

Test it online!

By coincidence, I don't think this can be golfed much even with non-ASCII chars...

Explanation

UnHo127 uA9 md)sG l /2 c   Implicit: U = input string, A = 10, G = 16, H = 32
  Ho127                    Create the range [32, 33, ..., 126].
        uA9                Insert 9 and 10 at the beginning of this range.
            md             Map each to a character, yielding ["\t", "\n", " ", "!", ... "~"].
Un            )            Convert U to a number via this alphabet ("\t" -> 0, "~" -> 96, etc.)
               sG          Convert this number to a base-16 (hexadecimal) string.
                  l        Take the length of this string.
                    /2 c   Divide by two and round up to get the length in base-256.
                           Implicit: output result of last expression

ETHproductions

Posted 2017-08-15T08:50:03.243

Reputation: 47 880

3

J, 36 bytes, score = 30

256#@(#.inv)97x#.(u:9,10,32+i.95)&i.

Try it online!

J uses only the 7-bit ASCII characters for its primitives.

Explanation

256#@(#.inv)97x#.(u:9,10,32+i.95)&i.  Input: string S
                 (              )     Form 7-bit ASCII alphabet
                            i.95        Range [0, 95)
                         32+            Add 32
                    9,10,               Prepend 9 and 10
                  u:                    Convert to characters
                                 &i.  Index of each char in S in that alphabet
            97x#.                     Convert from base 97 to decimal
256   #.inv                           Convert to base 256
   #@                                 Length

miles

Posted 2017-08-15T08:50:03.243

Reputation: 15 654

3

Gaia, 14 bytes, score 14

9c₸c₵R]$;B₵rBl

Try it online!

Explanation

9c              Push a tab character. (done like this since tab isn't in the codepage)
  ₸c            Push a linefeed character.
    ₵R          Push all printable ASCII characters.
      ]$        Concatenate everything together.
        ;       Copy second-from-top, implicitly push input. Stack is now [ASCII input ASCII]
         B      Convert input from the base where the ASCII string is the digits.
          ₵rB   Convert that to the base where the code page is the digits (base 256).
             l  Get the length of the result.
                Implicitly output top of stack.

ASCII only

This is the best I could come up with using only ASCII, giving a score of 19:

9c10c8373c'R+e]$;B256Bl

The difficulty is in the conversion of input. The only reasonable way to convert from the base-97 system is to use B, since mapping requires the non-ASCII ¦. Additionally, there isn't currently a way to make a character range without mapping c over a number range, which suffers the same problem. The best solution I could see was constructing the string ₵R and evaling it.

Business Cat

Posted 2017-08-15T08:50:03.243

Reputation: 8 927

Did you try and make an ASCII only version of this? It may not improve your score (I suppose ₵R and ₵r are not easy to replace, although obviously is), but it may be interesting to see how it compares. – Sanchises – 2017-08-15T13:40:38.863

@Sanchises I did, but the shortest I came up with ended up being 19, since is code point 8373 and I can't do character ranges in only ASCII either, which is a little frustrating since most of this program is ASCII. – Business Cat – 2017-08-15T13:42:26.027

Yes, it's really close to being ASCII only. Quick question: I don't know Gaia but played around with it a bit just now, but is there a way to convert a list of numbers? (like c but applied to each character, $ just shows all the numbers) – Sanchises – 2017-08-15T13:46:39.200

@Sanchises You'd have to map c over the list, which would be – Business Cat – 2017-08-15T13:48:12.243

Actually ₵r is easy to replace since I could just use 256 instead, I only used that because it's 1 byte shorter and the program wasn't ASCII only anyway. – Business Cat – 2017-08-15T13:49:13.170

3

Python 2, score 60

lambda s:len(bin(reduce(lambda a,c:a*97+ord(c)-[30,9][c<' '],s,0)))+5>>3

Try it online!

Mapping to base-97

The value of a character is obtained by ord(c)-[30,9][c<' ']: its ASCII code, minus 9 for tabs and newlines (which precede ' ' lexicographically), or minus 30 for everything else.

Converting to a number

We use reduce to convert the string to a number. This is equivalent to computing

a = 0
for c in s: a = a*97+ord(c)-[30,9][c<' ']
return a

Computing base-256 length

The return value of bin is a string, which looks somewhat like this:

"0b10101100111100001101"

Call its length L. A value with an n-bit binary representation has a ceil(n/8)-bit base-256 representation. We can compute n as L-2; also, ceil(n/8) can be written as floor((n+7)/8) = n+7>>3, so our answer is L-2+7>>3 = L+5>>3.

The case where the input string has value 0 is handled correctly, as bin returns "0b0", so we return 3+5>>3 = 1.

Lynn

Posted 2017-08-15T08:50:03.243

Reputation: 55 648

64 – Halvard Hummel – 2017-08-15T14:07:02.357

@HalvardHummel fairly sure that should be c>=' ' or else you map space to 23 instead of 2. In ordinary code golf c>'\x1f' (a raw byte) would have helped me, but that isn't printable ASCII… – Lynn – 2017-08-15T14:10:09.053

You are right, my bad – Halvard Hummel – 2017-08-15T14:13:18.450

2

Perl 5, 76 + 1 (-F) = 77 bytes

}{$d+=97**(@F+--$i)*((ord)-(/	|
/?9:30))for@F;say!$d||1+int((log$d)/log 256)

Try it online!

How?

Implicitly, separate the characters of the input (-F), storing all of that in @F. Close the implicit while loop and start a new block (}{) (Thanks, @Dom Hastings!). For each character, multiply its value by 97 to the appropriate power. Calculate the number of characters by finding the size of the sum in base 256 using logarithms.

Xcali

Posted 2017-08-15T08:50:03.243

Reputation: 7 671

2

APL, score 24 (bytes*)

⌈256⍟97⊥97|118|¯31+⎕AV⍳⍞

Assumes the default ⎕IO←1, otherwise just change ¯31 to ¯30.

Explanation:

                   ⎕AV⍳⍞  Read a string and convert it to ASCII codepoints + 1
               ¯31+       Subtract 31, so that space = 2, bang = 3, etc.
           118|           Modulo 118, so that tab = 97, newline = 98
        97|               Modulo 97, so that tab = 0, newline = 1
     97⊥                  Decode number from base 97
⌈256⍟                     Ceiling of log base 256, to count number of digits

Examples:

      ⌈256⍟97⊥97|118|¯31+⎕AV⍳⍞
example@domain.com
15
      ⌈256⍟97⊥97|118|¯31+⎕AV⍳⍞
All your base are belong to us!
26
      ⌈256⍟97⊥97|118|¯31+⎕AV⍳⍞
       abcd
9
      ⌈256⍟97⊥97|118|¯31+⎕AV⍳⍞
~      abcd
10

________________
*: APL can be written in its own legacy charset (defined by ⎕AV) instead of Unicode; therefore an APL program that only uses ASCII characters and APL symbols can be scored as 1 char = 1 byte.

Tobia

Posted 2017-08-15T08:50:03.243

Reputation: 5 455

Not all APL symbols are in ⎕AV (at least for Dyalog), such as . All of your symbols do count as one byte each, though. So not every APL symbol = 1 byte like you state in the footnote. (Just thought I'd let you know that.) Also, which APL dialect are you using? – Zacharý – 2017-08-18T18:09:05.540

1

Jelly, score: 18 (bytes)

9,⁷Ọ;ØṖiЀ⁸’ḅ97b⁹L

Try it online!

Erik the Outgolfer

Posted 2017-08-15T08:50:03.243

Reputation: 38 134

Let us continue this discussion in chat.

– Erik the Outgolfer – 2017-08-15T11:39:57.233

1

MATL (19 bytes), score 16

9=?1}G9tQ6Y2hh8WZan

Non-printable characters (tab, newline) in the input string are entered by contatenating their ASCII codes (9, 10) with the rest of the string.

The initial part 9=?1}G is only necessary because of a bug in the Za (base conversion) function, which causes it to fail when the input only consists of "zeros" (tabs here). It will be fixed in the next release of the language.

Explanation

9=      % Implicitly input a string. Compare each entry with 9 (tab)
?       % If all entries were 9
  1     %   Push 1. this will be the ouput
}       % Else
  G     %   Push input string again
  9     %   Push 9 (tab)
  tQ    %   Duplicate, add 1: pushes 10 (newline)
  6Y2   %   Push string of all printable ASCII chars
  hh    %   Concatenate twice. This gives the input alphabet of 97 chars
  8W    %   Push 2 raised to 8, that is, 256. This represents the output
        %   alphabet, interpreted as a range, for base conversion
  Za    %   Base conversion. Gives a vector of byte numbers
  n     %   Length of that vector
        % End (implicit). Display (implicit)

Luis Mendo

Posted 2017-08-15T08:50:03.243

Reputation: 87 464

1

Ruby, 70 bytes, score 58

->n{x=0;n.bytes{|i|x+=i-(i<32?9:30);x*=97};a=x.to_s(2).size/8;a<1?1:a}

Try it online!

Value Ink

Posted 2017-08-15T08:50:03.243

Reputation: 10 608

1

Befunge-93, 83 79 bytes, score 74 65

<v_v#-*52:_v#-9:_v#`0:~
 5v$
^6>>1>\"a"* +
 >*- ^   0$<
0_v#:/*4*88\+1\ $<
.@>$

Try it here!

The program first converts the input to a base-97 number, and then counts how many digits are required for a base-256 number. As such, the base-97 number is huge, so big that TIO will output a maximum value of 8 for large values; however, the JS interpreter doesn't care and will output the correct value.

user55852

Posted 2017-08-15T08:50:03.243

Reputation: