Write two programs A and B.

Program A (the encoder) takes 1024 pairs of integers \$(k,v)\$, where \$0≤k<2^{32}\$ and \$0≤v<1024\$, and all \$k\$ will be different. It will output a single positive integer (likely a few thousand digits long).
Program B (the decoder) takes two inputs: the output-integer of program A, and one \$k\$ from the input pairs we've used as input for program A. It will output the \$v\$ corresponding with the given \$k\$.

In other words, Program A compresses 1024 key-value pairs to the smallest size possible, and Program B must use the data provided by A to recover each value given its corresponding key.

Your score is the average value of the output of program A for the test data. Hardcoding for a given test case is forbidden, so this should be essentially the same as the score on the pretest data. Pretest data and test data are drawn from the same (uniform) distribution.

Random pretest data is available here.

Random.txt
MD5: 6FA62A283282F5CFBC6E089B2369C9B9
SHA1: 0BF7FB62A8566575D4A948DD34A559BAAAC422FC
CRC32: 2675DAB3

Testdata.txt
MD5: 6C8F90F126C8F3C55A9FC9F22733CE55
SHA1: F7D11AAB54393DE8CC4E75565950D8C6D9CFE42A
CRC32: B3DB1479

Actual test cases:

https://www.mediafire.com/folder/i6hmn9zzk73ig/174035

A stupid program A that work but totally didn't optimize is here. Its program B is easily written.

l4m2

Posted 2018-10-15T08:33:45.280

Reputation: 5 985

Comments are not for extended discussion; this conversation has been moved to chat.

– Mego – 2018-10-25T13:49:57.093

Answers

Python 3, pretest ≈ 3.198 · 10³⁰⁸⁵

from hashlib import sha512
import sys

def encode(pairs):  # program A
    elim = [[] for i in range(10)]
    done = set()
    code = 0
    i = 0
    while len(done) < 10 * len(pairs):
        e = 1 << i
        for a, e1 in elim[i % 10]:
            if decode(e, a) & 1 << i % 10:
                e ^= e1
        for a, b in pairs:
            if (a, i % 10) not in done and decode(e, a) & 1 << i % 10:
                elim[i % 10].append((a, e))
                done.add((a, i % 10))
                if (decode(code, a) ^ b) & 1 << i % 10:
                    code ^= e
                print("progress:", len(done), "of", 10 * len(pairs))
                break
        i += 1
    return code

def decode(code, a):  # program B
    k = -(-code.bit_length() // 8)
    a_bytes = a.to_bytes(4, "little")
    s = b"".join(
        sha512(a_bytes + i.to_bytes(4, "little")).digest() for i in range(-(-k // 64))
    )
    b = 0
    for i, (x, y) in enumerate(zip(code.to_bytes(k, "little"), s)):
        b ^= (x & y) << i * 8 % 10
    return b & 1023 ^ b >> 10

def read_pairs():
    return [tuple(map(int, line.split())) for line in sys.stdin]

def test(pairs):
    code = encode(pairs)
    for a, b in pairs:
        assert decode(code, a) == b
    print(code)

test(read_pairs())

Pretest output

The encoder (program A) takes about 6 minutes to run in PyPy. Its output for the pretest is:

31983609135966228809777741024866634396367955850954563312910243722966419655757435319862170247982609333805811315728884450895580799383047152743378820273077053884539894357668608677003723034054150072472679319590110643492810118920515801395958239291576592748237372467748241886558720871264342908155191777359585845696600370935487361341785211913110481040589139281898484188938846817560295768498881287225642023856190028544427590648046456036574330957328066402924125404073006144179946215625958643494081212962075340324684130280414625483652805036566339457267551141894682726365505972133784971174734160698976154775811488656096461968463830423074194023690503590590775033695214423998615444187716064686144001451215863398975347419967640878071667106313050954800893007781291647431510389526738768573353015452189679155051541561231146940706328809607485776138460159591319080343284358448724370020894125934538601889550086934233091980254236012141359542061770911092807766526451019824031548581065916355093797436687555279312288875998494044065641530299531661256941700326425163573657004343895103496724901715610405315696830871086538183435694121760474073023458471744226820280782007502797691083190318324487197753443963571848733746071847847503077912802000091614327968964141450181973360651625471354587026290724846906554164924720066533282988840115094797514460217879380690461261835733295861803705394151420199059533051945944829244868440580371672125590557347324783817602474773801825029895884776404639198926793779867410515553784121435319134307253638568331529787781235084628432871984360992875472633958491882569077293145250495277979952525973740358428525766027507530317692602517027217679810063987302625422496799748842451355136627998296037177558122844929449377522984595368107136303866235523533278732324181522202203636626264487517560120680389621562028772569157498411869721766672498939022664474767985892820170334179252554511620260319259569593256276255895000364202791049913505972232782566530566341223654980901733905002516092163056581626593518077383131036496001950185439305868415290006860241666136700655660075381115903744293277973930796274779653647714016073922883203769217980038066409200998466193202329090278491648479911508276027286548829602558975386937067679077276971208214625876007703346632539055994199462748282900998214486645162277623073498410220953186861172886747394992230973314265902007187446129816157393697305312624793190210450979525381906198852578619071224265649145441460469662250448218764499047348362548750003370944772956154662554023857705149861774126960045122299555516709305181620013892453158471100545618056605123335183465355868672237737792628988379889459029444873409805140995088383991356246685699313569301017907549835057806089472106637383136584875471496123974387658318450782138871582131161643221907721116143710024435066884630344739171885220542678102635637192667329306874983507141585026570891055701250722003923329858567937864129882737742073320064153568198296932909448928951903564049064190959836027425436414766746048788435990514173396582362591284614310722316033029018983309186148295917238363165751109139219360789557442877663768774219

which you can use to verify the much faster decoder (program B) if you don’t want to wait for the encoder.

How it works

The decoder expands \$a\$ into a deterministic pseudorandom stream using SHA-512, bitwise ANDs the stream with the code (output of program A), and bitwise XORs all the base 1024 digits of the result.

It has the important property that the decoder is a linear function of the code: decode(code1 ^ code2, a) == decode(code1, a) ^ decode(code2, a). Therefore, the encoder can compute the needed code using Gaussian elimination.

Scoring note

For official scoring, please make the thoroughly negligible “output formatting” adjustment of adding 1 to the code value, since otherwise a theoretically possible output is 0, which is not a positive integer. As of this writing, the other answer has the same property.

Anders Kaseorg

Posted 2018-10-15T08:33:45.280

Reputation: 29 242

Wolfram Language (Mathematica), \$n \approx 2.088\times 10^{5033}\$ for pretest

a=Block[{aValues,bValues,p,poly,i},
  {aValues,bValues}=Thread@#;
  p=NextPrime[nPair-1];i=0;
  While[!DuplicateFreeQ[aValues~Mod~p],
    p=NextPrime@p;++i;
  ];
  coef=CoefficientList[PolynomialMod[
    InterpolatingPolynomial[Thread@{aValues,bValues},x,
      Modulus->p],p],x];
  coefval=FromDigits[Reverse@coef,p];
  FromDigits[
    Join[{coefval},8191~Table~Floor[i/8191],{i~Mod~8191}]
    ,8192]+1
]&;
b=Block[{i,p,coef,n,aValue},
  n=#-1;i=0;
  While[
    {n,p}=QuotientRemainder[n,8192];
    i+=p;
    p==8191];
  p=NextPrime[nPair-1,i+1];
  coefReversed=IntegerDigits[n,p];
  aValueQuery=#2;
  Fold[
    Mod[#*aValueQuery+#2,p]&,
    Table[0,Length@aValueQuery],
    coefReversed
  ]
]&;

Try it online! (generate random test case based on random seed)

Try it online! (pretest)

Use InterpolatingPolynomial.

Due to the Birthday's paradox, this is not as efficient as I expected.

user202729

Posted 2018-10-15T08:33:45.280

Reputation: 14 620

As Anders says, this code doesn't work for {{0,0},{1,0},{2,0},{3,0},{4,0},{5,0},{6,0},...,{1022,0},{1023,0}}. Either add 1 to the result or map zero to an impossibly reached value – l4m2 – 2018-10-17T08:36:57.070

I guess You should give your pretest score, not the code size: Smallest average value of the output of program A over a fixed collection of random test cases wins – Titus – 2018-10-17T11:01:59.660

@titus fixed. '' – user202729 – 2018-10-18T05:41:30.150

@l4m2 fixed. ''' – user202729 – 2018-10-18T05:41:42.030

Python 3, 1.873x10⁴⁴⁰⁵

import sys
inputs = sys.stdin.read()

numbers = [list(map(int, l.split(' '))) for l in inputs.strip().split('\n')]
def encode(numbers):
  output = ''
  def allNumbers(numbers, a, b):
    nonlocal output
    if len(numbers) == 0: output += '00'
    elif len(numbers) == 1:
      output += '0'
      output += bin(1024 + numbers[0][1])[2:]
    else:
      output += '1'
      l, r = [], []
      m = (a + b) // 2
      for item in numbers:
        if item[0] < m: l.append(item)
        else: r.append(item)
      allNumbers(l, a, m)
      allNumbers(r, m, b)
  allNumbers(numbers, 0, 2 ** 32)
  return int(output, 2)

def decode(output, key):
  output = bin(output)[2:]
  index = 0
  def peak(n):
    return output[index:index + n]
  def skip(n):
    nonlocal index
    index += n
  result = 0
  def findRange(a, b):
    nonlocal result
    if peak(1) == '0':
      if peak(2) == '00':
        skip(2)
        return
      else:
        skip(2)
        n = int(peak(10), 2)
        if a < key < b: result = n
        skip(10)
        return
    skip(1)
    m = (a + b) // 2
    findRange(a, m)
    findRange(m, b)
  findRange(0, 2 ** 32)
  return result

output = encode(numbers)
print("%s.%sE+%d" % (str(output)[0], str(output)[1:10], len(str(output)) - 1))
for (key, value) in numbers:
  assert(decode(output, key) == value)

Try it online!

tsh

Posted 2018-10-15T08:33:45.280

Reputation: 13 072

Sample answer: Python 3, pretest ≈ 3.612 · 10¹²⁹⁴⁶

This is just the sample program A given in the problem, along with a corresponding program B, to help those having trouble understanding the problem.

Program A simply concatenates the (32 + 10)-bit binary representations of all the input pairs. Program B iterates through these (32 + 10)-bit chunks until it finds the chunk whose first 32 bits match the given \$a\$, and outputs the corresponding \$b\$ from its last 10 bits.

Program A

x = 0
for i in range(1024):
  [a,b] = input().split()
  x = x * 4398046511104 + int(a) * 1024 + int(b)
print (x)

Try it online!

Program B

x, a = map(int, input().split())
while (x % 4398046511104) // 1024 != a:
  x //= 4398046511104
print(x % 1024)

Try it online!

Anders Kaseorg

Posted 2018-10-15T08:33:45.280

Reputation: 29 242

Didn't OP already provide one like that? – user202729 – 2018-10-18T05:41:12.653

@user202729 OP provided exactly this program A and said “its program B is easily written”. I don’t know why people seem to be having so much trouble understanding this challenge, but in case an example with an explicit program B would help them, I’ve provided one in this community wiki. – Anders Kaseorg – 2018-10-18T06:07:19.713

@user202729 Also, the score on this sample answer is useful context for understanding the scores on the real answers—it might seem surprising at first glance that this answer is significantly far from optimal. – Anders Kaseorg – 2018-10-18T06:15:33.913

(this is secondary discussion, but) Personally, I think that (assume that the English don't have problems) people who cannot figure out what to do with the precise mathematical defintion (still, it looks pretty intuitive to me already), are unlikely to post competitive answers even if they do understand the task; on the other hand. Adding a "# Why this challenge is interesting" section takes effort, and the answers are probably not better too. The only problem is the downvotes, but I hope the OP don't mind that. – user202729 – 2018-10-18T12:11:02.870

Send the pairs in smallest output

Answers

Python 3, pretest ≈ 3.198 · 103085

Pretest output

How it works

Scoring note

Wolfram Language (Mathematica), \$n \approx 2.088\times 10^{5033}\$ for pretest

Python 3, 1.873x104405

Sample answer: Python 3, pretest ≈ 3.612 · 1012946

Program A

Program B

Python 3, pretest ≈ 3.198 · 10³⁰⁸⁵

Python 3, 1.873x10⁴⁴⁰⁵

Sample answer: Python 3, pretest ≈ 3.612 · 10¹²⁹⁴⁶