Balanced Zero-One Encoding

12

1

Task

Encode a string that entirely consists of uppercase alphabets (A-Z) using only zeros and ones, using your own favorite scheme. But the rule isn't that simple!

Rules

  1. Your program/function must correctly handle any valid input string of length 8.
  2. The results must have the same length for all inputs.
  3. The results must be distinct for distinct inputs.
  4. The results must be as short as possible.
  5. The results must be zero-one balanced (have number of ones similar to that of zeros). They don't have to be equal (i.e. perfectly balanced), but your score will be penalized for that.

You don't have to provide a program/function that decodes your encoding.

Input and Output

  • You can decide to accept any set of 26 distinct printable ASCII characters instead of A-Z.
  • You can decide to output any pair of distinct printable ASCII characters instead of 0 and 1.
  • You are not allowed to output an integer instead of a bit string, since it may have leading zeros and it's unclear if you actually met the rule 2.
  • If you decide to deviate from the default (A-Z input and 01 output), you must specify the input/output character sets in your submission.

Scoring

  • Base score: Code size, or 1 if your program happens to be empty.
  • Penalties
    • Penalty for length: multiply 1.5 ** (encoded length - 42)
    • There is no bonus for being shorter; 42 is the minimal length for a perfectly balanced encoding of 8-length strings with alphabet size 26.
    • Penalty for being unbalanced: multiply 2 ** max(abs(ones - zeros) for every valid input of length 8), where ones and zeros are the counts of 1 and 0 in each output, respectively.
    • Your submission must either show a worst-case example (input/output) or theoretical explanation on the penalty value.
  • The lowest score wins.

Example Submission

Hypothetical esolang, 0 Bytes, Score 74733.8906

Here is a hypothetical esolang, where an empty program prints out all the ASCII codes of input's characters in binary.

 

For example, if you give AAAAAAAA as input, the program will print 1000001 8 times in a row, i.e. 10000011000001100000110000011000001100000110000011000001.

The input alphabet is chosen to be CEFGIJKLMNQRSTUVXYZabcdefh. This way, all of the chars are converted to seven digits in binary, and the zero-one counts differ only by one per char (they all have three 1's and four 0's or vice versa when converted to binary).

The output length is always 56, and the worst-case unbalance occurs on the inputs like CCCCCCCC, where zeros appear 8 more times than ones.

Therefore, the score of this submission is 1.5 ** (56 - 42) * 2 ** 8 == 74733.8906.

Bubbler

Posted 2018-03-05T07:58:46.963

Reputation: 16 616

Sandbox: https://codegolf.meta.stackexchange.com/a/14908/78410

– Bubbler – 2018-03-05T08:02:42.900

can I use my hypothetical esolang in which the empty program accepts a number N in letter-encoded 26-ary and outputs the N-th possible 42-bit sequence of sum 21? – ngn – 2018-03-05T08:32:34.993

@ngn - does your hypothetical language meet our accepted criteria? - EDIT ah input is always [A-Z] - I guess that's easy enough... :)

– Jonathan Allan – 2018-03-05T08:42:24.580

@JonathanAllan yeah, Balboolenc is transpiled to Python - the empty program does as described above, any other program remains intact :) – ngn – 2018-03-05T08:47:02.460

Can the code take really long to run? For example, I might iterate over all length-42 binary strings to find balanced ones. – xnor – 2018-03-05T09:02:16.623

@xnor It's fine as long as you can prove the score of your submission. – Bubbler – 2018-03-05T09:14:03.860

1Can we output a list of ones and zeroes or does it have to be a string? – Dennis – 2018-03-05T12:30:11.677

@Dennis List output is OK, as in any other challenges. – Bubbler – 2018-03-05T22:21:52.960

Penalty for length: multiply 1.5 ** (encoded length - 42) do you mean 1.5 ** max(encoded length - 42, 0)? – l4m2 – 2018-03-06T16:08:43.667

1The whole question is lead into "mustn't have unbalance, must be in 42 digits, who care real running time" – l4m2 – 2018-03-06T16:27:42.293

Answers

4

Stax, 11 bytes, 0 penalty, Score 11

This program uses [0-9A-P] for input and [01] for output.

ö■▄←·ï↨≡⌐╠H

Run and debug it online - click the run button to start. The first four test cases run in milliseconds. The fifth in seconds. The sixth in millenia.

The corresponding ascii representation of this program is this.

A$21*,26|bD|N

It leans heavily on the |N instruction, which gets the subsequent permutation of an array.

A$21*           "10" repeated 21 times
     ,26|b      get input and decode it as a base 26 number
          D|N    ... that many times get the next lexicographic permutation

All outputs are permutations of the initial string. It has 21 zeroes and 21 ones. Therefore, all outputs are 42 characters, and perfectly balanced.

recursive

Posted 2018-03-05T07:58:46.963

Reputation: 8 616

3

Jelly, 19 bytes

O_65ḅ26ị2Ḷ¤x21¤Œ!Q¤

Try it online!

Explanation

O_65ḅ26ị2Ḷ¤x21¤Œ!Q¤  Main Link
O                    Take the character code of each character
 _65                 Subtract 65 (the code of "A")
    ḅ26              Convert to base 26
       ị             Get the <left-arg>th element of:
        2Ḷ¤x21¤Œ!Q¤  All balanced strings of length 42:
        2Ḷ           range(2) == [0, 1]
           x21       stretch 21 ([0, 0, ..., 0, 1, 1, ..., 1])
               Œ!    all permutations
                 Q   deduplicate

HyperNeutrino

Posted 2018-03-05T07:58:46.963

Reputation: 26 575

E x p l a n a t i o n? – Esolanging Fruit – 2018-03-06T04:03:11.517

@EsolangingFruit added – HyperNeutrino – 2018-03-06T04:09:51.690

3

Pyth, 20 19 14 bytes, Max Diff: 0, Length: 64, Score: 149636.5528 142154.7251 104745.5869

sm@S.{.p*`T4xG

Try it online!

Uses the lower case alphabet ([a-z]) instead of uppercase. Can use uppercase by replacing G with rG1, at the cost of 2 bytes.

I could have translated HyperNeutrino's Python 3 answer for a better score, but frankly, I want an answer that actually works.

hakr14

Posted 2018-03-05T07:58:46.963

Reputation: 1 295

2

Python 2, 779 645 bytes, Max(Diff) = 0, Length = 48, Score = 7346.95

def f(s):
 a,b=0,""
 for i in s:a=a*26+ord(i)-65
 a+=56*252**4
 for i in range(5):b=bin((int("4lnk28t9vtqgfrpfda9uyfrjhcjwjvno6aec2nwegi0g4mnublc05dher8fjm4s5gh55lu87a4itmc74t6tozcsfdbxkg82frwljy0wam1jht98g2j0bma021v5d48pwq0fklv0n1ltrxft1fpk5gt5mx5fj4p2mjqqpvcylt1xayxf1iwdmyoxgfvl7oui1oo6147bm9rqpqut9ns8hhjc77t3pqy48otovrsm1t4mmleumspkuef66ma1vi0l4mtkwaeeizuvvds9fro3vhc0mrn6ox17rdpk7xw747qf28934u5jci5q1qj81i7dyf7rf0x7hb19xm93xhxsgh4w8ifs6fhynsddbo9j938ewfvhjlbpiz50n5hanmno6c89blyx50e89z7vjq2ho2r2u2wwyu4q18kv4fi1nhmfbgjbnkdayr5kblaped4fo5u97bi9a67d89irxa0r9cinmnohfgjmh5fhkcr33",36)>>a%252*10)&1023)[2:].rjust(10,"0")+b;a/=252
 return b[2:]

Try it online!

The magic number 4lnk28t9vtqgfrpfda9uyfrjhcjwjvno6aec2nwegi0g4mnublc05dher8fjm4s5gh55lu87a4itmc74t6tozcsfdbxkg82frwljy0wam1jht98g2j0bma021v5d48pwq0fklv0n1ltrxft1fpk5gt5mx5fj4p2mjqqpvcylt1xayxf1iwdmyoxgfvl7oui1oo6147bm9rqpqut9ns8hhjc77t3pqy48otovrsm1t4mmleumspkuef66ma1vi0l4mtkwaeeizuvvds9fro3vhc0mrn6ox17rdpk7xw747qf28934u5jci5q1qj81i7dyf7rf0x7hb19xm93xhxsgh4w8ifs6fhynsddbo9j938ewfvhjlbpiz50n5hanmno6c89blyx50e89z7vjq2ho2r2u2wwyu4q18kv4fi1nhmfbgjbnkdayr5kblaped4fo5u97bi9a67d89irxa0r9cinmnohfgjmh5fhkcr33 (in base 36), or its decimal equivalent 382136276621246556626597379364678993894472503063952720559883124988542417847157286833446006767955087631166943136913765901237281892296575754126024183763829277879554548743231384272055945084065681774297483130020386641869860456147616177702938121538230311395513497506285733567467605871232294046704309941152721616618474501854355102646152223338484615876165254236449912858255665248186687952137487016925761633237335983620006273901509768720506129789443353730706676483647298576692613113269388239830925662977837917272690235355742330377154505179476767457756888107428475384947712227312747517748632498691058764154580934614231152483398774630508576533263098942260213967880819240793990219283490212843120923539516962682466148372296338428497778127570401190309339992457562121354271, encodes all 252 permutations of 5 0s and 5 1s.

The algorithm first converts A-Z into 0-25 and treat it as a base-26 number, then add 56*252**4.

Then, the number is converted to a 5-digit base-252 number, and substitute with the corresponding permutation of 5 0s and 5 1s.

After that, delete the first 2 bits, which is guaranteed to be 01. Then we have encoded the string to a 48-bit string, which consists of exactly 24 0s and 24 1s.

Shieru Asakoto

Posted 2018-03-05T07:58:46.963

Reputation: 4 445

Pretty sure the penalties have to be multiplied (that is, your score is 7346.953125). – HyperNeutrino – 2018-03-05T09:29:42.757

@HyperNeutrino Oh I though it is addition ;P Edited – Shieru Asakoto – 2018-03-05T09:33:13.413

2

Perl 5, 55 bytes, max diff 0, length 42, score 56 55

This works but will take a long but doable time (ZZZZZZZZ took 2.5 days on my computer). Memory is no problem.

Uses A-Z as input and 1 and A as encoding characters. They are always perfectly balanced. Skips the first 26^7 = 8031810176 balanced combinations representing string shorter than 8 characters, but that's OK since there are 538257874440 available and I use 208827064575 and 208827064575 + 8031810176 < 538257874440.

However it actually "counts" up to to the target combination which will take very long. That's why in the TIO link I only used a too short input string (which are also supported) to demonstrate that the output is correct. Will work up to a bit more than AAAAAA before TIO times out.ZZZZZZZZ should be about 26^3 = 17576 times slower.

#!/usr/bin/perl -ap
$_=1x21 .($i=A)x21;s/(A*)(1*)1A/$2$1A1/ until"@F"eq$i++

Try it online!

The decoder is almost the same:

#!/usr/bin/perl -ap
$_=1x21 .($\=A)x21;s/(A*)(1*)1A/$2$1A1/,$\++until"@F"eq$_}{

Try it online!

Ton Hospel

Posted 2018-03-05T07:58:46.963

Reputation: 14 114

2

JavaScript (ES8), score 22186.623779296875

f=
s=>s.replace(/./g,(c,i)=>(i%2*127^c.charCodeAt()).toString(2).padStart(7,0))
<input oninput=o.textContent=f(this.value)><pre id=o>

For even length input, always outputs 3.5* of zeros and ones, so it only pays the 1.5 ** 14 penalty. Supported characters: '+-.3569:<GKMNSUVYZ\cefijlqrtx.

Neil

Posted 2018-03-05T07:58:46.963

Reputation: 95 035

2

Jelly, 16 bytes

42ɠO%ḅ26ịœcH$ạ‘Ṭ

Uses +,-./0123456789:;<=>?@ABCD for input and returns a list of ones and zeroes.

This attempts to build a list of 538,257,874,440 combinations in memory, so you'll need a large amount of RAM to run it as is...

Try it online! (testable; input length 3, output length 18)

How it works

42ɠO%ḅ26ịœcH$ạ‘Ṭ  Main link. No arguments.

42                Set the argument and the return value to 42.
  ɠ               Read one line from STDIN.
   O              Ordinal; map ['+', ..., 'D'] to [43, ..., 69].
    %             Take the code points modulo 42, mapping [43, ..., 69] to
                  [1, ..., 26].
     ḅ26          Convert the result from base 26 to integer.
            $     Combine the two links to the left into a monadic chain.
           H          Halve; yield 21.
         œc           Generate all 21-combinations of [1, ..., 42].
                  There are 538,257,874,440 of these combinations. The first
                  269,128,937,220 begin with a 1.
        ị         Retrieve the combination at the index to the left.
                  [26, 26, 26, 26, 26, 26, 26, 26] in bijective base 26 equals
                  217,180,147,158 in decimal, so the retrieved combination will
                  begin with a 1.
              ‘   Increment; yield 43.
             ạ    Absolute difference; map [1, ..., 42] to [42, ..., 1].
                  The combination now begins with a 42.
               Ṭ  Untruth; turn the combination into a Boolean list, with 1's
                  at the specified indices and 0's elsewhere.
                  Since the maximum of the combination is 42, this list will have
                  exactly 42 items, 21 of which will be 1's.

Dennis

Posted 2018-03-05T07:58:46.963

Reputation: 196 637

2

Python 3, 985 135 bytes, Max Diff 0, Length 42, score 135

lambda s:C(int(s,26),21,20)
B=lambda x,y:y<1or-~x*B(x+1,y-1)//y
def C(n,o,z):p=B(o,z);x=n>=p;return z+1and[x]+C(n-p*x,o-x,z-1+x)or[1]*o

Try it online!

Courtesy of Bubbler

Ungolfed code:

import math

def binomial(x, y):
    return math.factorial(x) // math.factorial(y) // math.factorial(x - y)

def string_to_int(input_str):
    result = 0
    for i in range(0,8):
        result += (ord(input_str[i])-65)%26 * pow(26,i)
    return result

def counting_function(target, ones, zeros):
    if zeros > 0:
        position = binomial(ones+zeros-1,zeros-1)
    else:
        position = 1
    if target > position:
        if ones > 0:
            print("1", end='')
            ones -= 1
            counting_function(target-position,ones,zeros)
    else:
        if zeros > 0:
            print("0", end='')
            zeros -= 1
            counting_function(target,ones,zeros)
        elif ones > 0:
            print("1", end='')
            ones -= 1
            counting_function(target,ones,zeros)

input_str = input("Input string (A-Z): ")
input_int = string_to_int(input_str)+1
target = input_int
ones = 21
zeros = 21
counting_function(target, ones, zeros)
print("")

Since other approaches seem quite inefficient, I've tried to make a time-optimal one. It is clealy O(N) in N bits of encoding, which is big-O optimal.

Hint: try to think of Pascal's triangle for this one (this diagram reveals it)

Sample outputs:

Input:  AAAAAAAA
Output: 000000000000000000000111111111111111111111

 

Input:  ZZZZZZZZ
Output: 011001000000011010011000111010110110111110

Execution time: < 0.013 s (roughly constant for all inputs)

Real

Posted 2018-03-05T07:58:46.963

Reputation: 121

1Here is a quick golfed version(135 bytes, score 135). – Bubbler – 2018-03-06T06:53:23.927

@Bubbler Incredible, I did not posses the skills to achieve this – Real – 2018-03-06T12:51:53.300

But you should make some effort to minimize your score. All submissions should make a serious effort to optimize the score, otherwise it's invalid. – user202729 – 2018-03-07T06:35:13.543

@user202729 I've modified to Bubbler's version then, which is minimized. I did make an effort to minimize my score though, just not through code size. – Real – 2018-03-07T12:45:53.490

About the latter point... correct. – user202729 – 2018-03-07T13:44:26.287

1

Python 3, 75 bytes, Max Diff 0, Length 42, Score 112

lambda s:sorted({*permutations("01"*21)})[int(s,26)]
from itertools import*

Try it online!

This only works in theory because of memory constraints. There are 538257874440 distinct balanced zero-one strings of length 42 and 208827064575 possible inputs, so some of the possible outputs will not be used.

-37 bytes thanks to @recursive

HyperNeutrino

Posted 2018-03-05T07:58:46.963

Reputation: 26 575

You could use int(s,26) for your index value rather than sum(...) if you change your input character set. – recursive – 2018-03-05T16:37:13.927

@recursive that would require unprintables. tried that already – HyperNeutrino – 2018-03-05T16:39:05.767

Unprintables? It uses [0-9A-P], doesn't it? On my machine, int("123ABC",26) == 12855114 – recursive – 2018-03-05T16:40:32.143

@recursive oh yeah you're right idk what I was thinking lol. thanks! – HyperNeutrino – 2018-03-05T16:57:00.247

1

><>, 75 bytes, Max Diff 0, Length 42, score 75

0i:0(?v'A'-$dd+*+!
.")1+.\1+:0$:2%:}:@-2,@+$bl"
[ab+-?\$:?vv~3
~~]>n<v$-1<>

Try it online!

Fair warning, this will take a very very very long time to complete even for the trivial AAAAAAAA case. Runs through each binary representations of a counter until the (base 26 representation of the input)'th binary number with 21 1s is reached. If you want to test the program somewhat you can replace the ab+ on the third line with 1 which will return the nth binary number with just a single 1, Try it online!

Jo King

Posted 2018-03-05T07:58:46.963

Reputation: 38 234

1

C++, 146 Bytes, 42 maxlength, 0 unbalance, score 146

#include<algorithm>
long long i,s;int f(char*I,char*O){for(O[i=42]=s=0;i--;i<8?s=s*26+I[i]:0)O[i]=i%2|48;for(;s--;)std::next_permutation(O,O+42);}

Works for any continious 26 char, but warning it runs an unacceptable time

l4m2

Posted 2018-03-05T07:58:46.963

Reputation: 5 985

It looks like that you're requiring an empty array being passed in addition. I don't think that's valid. / If you are using GCC you can replace #include<algorithm> with #import<regex>. – user202729 – 2018-03-07T06:33:52.967

I'll change it when GCC decide to stop using an given pointer as output – l4m2 – 2018-03-07T09:04:50.677

... so that's pointer to output? Looks valid then. But you should mention the input/output format explicitly. – user202729 – 2018-03-07T09:11:03.060