Latin square compression

32

7

A Latin square is a square that has no repeated symbols in the rows or columns:.

13420
21304
32041
04213
40132

And as many Sudoku players know, you don't need all of the numbers to deduce the remaining numbers.

Your challenge is to compress a Latin square into as few bytes as possible. You need to provide one or two program(s) that compresses/decompresses.

Various info:

  • The numbers used will always be 0..N-1, where N is the length of the square's edge, and N<=25
  • On decompression, the latin square must be identical to the input.
  • Your program(s) should be able to (de)compress any latin square (within the maximum square size), not just the ones I've provided. Compression ratios should be similar as well.
  • You must actually run the compression and decompressor to obtain your score (No end-of-universe runtimes)

The test cases can be found on github. Your score is total size of compressed test cases.

EDIT: As of 20:07 on July 7th, I updated the test cases (in order to fix a generation issue). Please rerun your program on the new test cases. Thanks Anders Kaseorg.

Nathan Merrill

Posted 2016-07-12T12:43:26.247

Reputation: 13 591

"A Latin square is a square that has no repeated symbols in the rows or columns." I suppose there's also the requirement that only symbols 0 to n-1 are used? – Martin Ender – 2016-07-12T12:48:50.330

1Well, by definition, any symbol could be used, but my test cases just happen to use 0 though n-1 :) – Nathan Merrill – 2016-07-12T12:51:26.563

1

Similar: http://codegolf.stackexchange.com/questions/41523/sudoku-compression

– feersum – 2016-07-12T12:54:34.093

3@NathanMerrill well, the point was only being allowed to use n different symbols. :P – Martin Ender – 2016-07-12T12:57:01.893

Can multibyte characters be used? – DavidC – 2016-07-12T13:37:27.560

1@DavidC It shouldn't matter, as the size is measured in bytes. – flawr – 2016-07-12T13:39:38.353

Does the output have to be the same every time the program is ran on the same input, or can we use randomness? – Loovjo – 2016-07-12T14:21:15.077

@Loovjo randomness is fine. – Nathan Merrill – 2016-07-12T15:33:51.207

What's to stop you hard-coding the test cases and compressing them to 1 byte, total score 25 bytes, and fitting the "compressing/decompressing any square" by leaving others unchanged? – TessellatingHeckler – 2016-07-12T17:43:03.410

@TessellatingHeckler fixed. – Nathan Merrill – 2016-07-12T17:53:43.137

One of the answers encodes the output in bits. Is that allowed? – Dennis – 2016-07-12T19:48:41.750

@Dennis AFAIK, we allow for any encoding. It sounds like a meta question to me. – Nathan Merrill – 2016-07-12T19:50:03.210

The default for code golf is bytes, but since this isn't code golf, I'd say its up to you. – Dennis – 2016-07-12T19:51:40.003

I'm ok with bit encoding. If somebody manages to encode their answer in 1 less bit, more power to them. – Nathan Merrill – 2016-07-12T19:54:17.987

219 of your 25 test cases (all those except 4, 6, 8, 10, 12, 14) were generated by permuting the rows and columns of the trivial Latin square whose (i, j) entry is i + j mod n. This makes them very easy to compress much more than a random Latin square. Although your rules say we should have similar compression ratios for all Latin squares, this might be easy to break by accident. The test cases should be more representative. – Anders Kaseorg – 2016-07-12T19:55:10.233

It turns out that SageMath has a built in Latin square generator (documentation), because of course it does. Run it on SageMathCell.

– Anders Kaseorg – 2016-07-12T19:56:01.560

@AndersKaseorg nice investigation. You are correct in your assumption of permutation (the site I used only did it via permutation at sizes higher than 15). I'll check out your link – Nathan Merrill – 2016-07-12T19:57:54.380

What does "within the maximum square size" mean? Up to size 25, the maximum of any of the test cases? (I assume so.) – David Conrad – 2016-07-13T04:58:47.463

@DavidConrad correct. – Nathan Merrill – 2016-07-13T05:41:17.053

FWIW, at the end of this XKCD forum post is some Python 2 code I wrote a few years ago that generates random Latin Squares. It's reasonably fast (for Python): it can do an order 60 square in under 30 seconds on my old 2GHz machine, which is considerably faster than an Exact Cover algorithm. It can display the grid in numeric form (with 1-based indices) or in alphabetic string form; it can also output the grid as a PPM image.

– PM 2Ring – 2016-07-13T11:40:20.050

@PM2Ring An approach based on completing rectangles to squares almost certainly cannot yield a uniform distribution, because different rectangles can be completed to squares in different numbers of ways. See Generate Random Latin Squares. SageMath uses the Jacobson and Matthews (1996) algorithm.

– Anders Kaseorg – 2016-07-13T19:15:54.907

@AndersKaseorg: Ah, ok. FWIW, I wasn't implying that my code gives a uniform distribution, just that the squares it makes are more random than those obtained by randomly permuting rows & columns (& symbols) of a trivial Latin Square. But thanks for that info, and thanks for looking at my code. – PM 2Ring – 2016-07-13T19:23:00.173

Answers

10

Python, 1281.375 1268.625 bytes

We encode the Latin square one “decision” at a time, where each decision is of one of these three forms:

  • which number goes in row i, column j;
  • in row i, which column the number k goes in;
  • in column j, which row the number k goes in.

At each step, we make all the logical inferences we can based on previous decisions, then pick the decision with the smallest number of possible choices, which therefore take the smallest number of bits to represent.

The choices are provided by a simple arithmetic decoder (div/mod by the number of choices). But that leaves some redundancy in the encoding: if k decodes to a square where the product of all the numbers of choices was m, then k + m, k + 2⋅m, k + 3⋅m, … decode to the same square with some leftover state at the end.

We take advantage of this redundancy to avoid explicitly encoding the size of the square. The decompressor starts by trying to decode a square of size 1. Whenever the decoder finishes with leftover state, it throws out that result, subtracts m from the original number, increases the size by 1, and tries again.

import numpy as np

class Latin(object):
    def __init__(self, size):
        self.size = size
        self.possible = np.full((size, size, size), True, dtype=bool)
        self.count = np.full((3, size, size), size, dtype=int)
        self.chosen = np.full((3, size, size), -1, dtype=int)

    def decision(self):
        axis, u, v = np.unravel_index(np.where(self.chosen == -1, self.count, self.size).argmin(), self.count.shape)
        if self.chosen[axis, u, v] == -1:
            ws, = np.rollaxis(self.possible, axis)[:, u, v].nonzero()
            return axis, u, v, list(ws)
        else:
            return None, None, None, None

    def choose(self, axis, u, v, w):
        t = [u, v]
        t[axis:axis] = [w]
        i, j, k = t
        assert self.possible[i, j, k]
        assert self.chosen[0, j, k] == self.chosen[1, i, k] == self.chosen[2, i, j] == -1

        self.count[1, :, k] -= self.possible[:, j, k]
        self.count[2, :, j] -= self.possible[:, j, k]
        self.count[0, :, k] -= self.possible[i, :, k]
        self.count[2, i, :] -= self.possible[i, :, k]
        self.count[0, j, :] -= self.possible[i, j, :]
        self.count[1, i, :] -= self.possible[i, j, :]
        self.count[0, j, k] = self.count[1, i, k] = self.count[2, i, j] = 1
        self.possible[i, j, :] = self.possible[i, :, k] = self.possible[:, j, k] = False
        self.possible[i, j, k] = True
        self.chosen[0, j, k] = i
        self.chosen[1, i, k] = j
        self.chosen[2, i, j] = k

def encode_sized(size, square):
    square = np.array(square, dtype=int)
    latin = Latin(size)
    chosen = np.array([np.argmax(square[:, :, np.newaxis] == np.arange(size)[np.newaxis, np.newaxis, :], axis=axis) for axis in range(3)])
    num, denom = 0, 1
    while True:
        axis, u, v, ws = latin.decision()
        if axis is None:
            break
        w = chosen[axis, u, v]
        num += ws.index(w)*denom
        denom *= len(ws)
        latin.choose(axis, u, v, w)
    return num

def decode_sized(size, num):
    latin = Latin(size)
    denom = 1
    while True:
        axis, u, v, ws = latin.decision()
        if axis is None:
            break
        if not ws:
            return None, 0
        latin.choose(axis, u, v, ws[num % len(ws)])
        num //= len(ws)
        denom *= len(ws)
    return latin.chosen[2].tolist(), denom

def compress(square):
    size = len(square)
    assert size > 0
    num = encode_sized(size, square)
    while size > 1:
        size -= 1
        square, denom = decode_sized(size, num)
        num += denom
    return '{:b}'.format(num + 1)[1:]

def decompress(bits):
    num = int('1' + bits, 2) - 1
    size = 1
    while True:
        square, denom = decode_sized(size, num)
        num -= denom
        if num < 0:
            return square
        size += 1

total = 0
with open('latin_squares.txt') as f:
    while True:
        square = [list(map(int, l.split(','))) for l in iter(lambda: next(f), '\n')]
        if not square:
            break

        bits = compress(square)
        assert set(bits) <= {'0', '1'}
        assert square == decompress(bits)
        print('Square {}: {} bits'.format(len(square), len(bits)))
        total += len(bits)

print('Total: {} bits = {} bytes'.format(total, total/8.0))

Output:

Square 1: 0 bits
Square 2: 1 bits
Square 3: 3 bits
Square 4: 8 bits
Square 5: 12 bits
Square 6: 29 bits
Square 7: 43 bits
Square 8: 66 bits
Square 9: 94 bits
Square 10: 122 bits
Square 11: 153 bits
Square 12: 198 bits
Square 13: 250 bits
Square 14: 305 bits
Square 15: 363 bits
Square 16: 436 bits
Square 17: 506 bits
Square 18: 584 bits
Square 19: 674 bits
Square 20: 763 bits
Square 21: 877 bits
Square 22: 978 bits
Square 23: 1097 bits
Square 24: 1230 bits
Square 25: 1357 bits
Total: 10149 bits = 1268.625 bytes

Anders Kaseorg

Posted 2016-07-12T12:43:26.247

Reputation: 29 242

I'm trying this code at ideone, but it just gives runtime errors. I modified it using stdin instead of file f. http://ideone.com/fKGSQd

– edc65 – 2016-07-13T17:10:15.780

@edc65 It doesn't work because Ideone's NumPy is outdated. – Dennis – 2016-07-13T19:01:52.880

@edc65 Ideone has NumPy 1.8.2 which is too old for np.stack(). In this case it can be replaced with np.array([…]), and I have done so in the current version. – Anders Kaseorg – 2016-07-13T22:05:07.107

hmmm. are all the squares stored in one byte stream? is the information about their size also stored, or the decoder assumes that they are of size 1,2,3,… etc? – Display Name – 2016-08-27T07:11:13.550

@SargeBorsch Each square is compressed to a separate bitstream. The decompressor recovers the square size unambiguously from the bit stream, using the algorithm I described. No assumption is used. – Anders Kaseorg – 2016-08-27T19:35:31.657

7

MATLAB, 3'062.5 2'888.125 bytes

This approach just ditches the last row and the last column of the square, and converts each entry into words of a certain bit depth. The bit depth is chosen minimal for the given size square. (Suggestion by @KarlNapf) These words are just appended to each other. Decompression is just the reverse.

The sum for all the test cases is 23'105 bits or 2'888.125 bytes. (Still holds for the updated test cases, as the size of my outputs is only dependent of the size of the input.)

function bin=compress(a)
%get rid of last row and column:
s=a(1:end-1,1:end-1);
s = s(:)';
bin = [];
%choose bit depth:
bitDepth = ceil(log2(numel(a(:,1))));
for x=s;
    bin = [bin, dec2bin(x,bitDepth)];
end
end

function a=decompress(bin)
%determine bit depth
N=0;
f=@(n)ceil(log2(n)).*(n-1).^2;
while f(N)~= numel(bin)
    N=N+1; 
end
bitDepth = ceil(log2(N));
%binary to decimal:
assert(mod(numel(bin),bitDepth)==0,'invalid input length')
a=[];
for k=1:numel(bin)/bitDepth;
    number = bin2dec([bin(bitDepth*(k-1) + (1:bitDepth)),' ']);
    a = [a,number];    
end
n = sqrt(numel(a));
a = reshape(a,n,n);
disp(a)
%reconstruct last row/column:
n=size(a,1)+1;
a(n,n)=0;%resize
%complete rows:
v = 0:n-1;
for k=1:n
    a(k,n) = setdiff(v,a(k,1:n-1));
    a(n,k) = setdiff(v,a(1:n-1,k));
end
end

flawr

Posted 2016-07-12T12:43:26.247

Reputation: 40 560

You could compress a little more by using a variable bitrate, like forn=9..16 4 bits are enough. – Karl Napf – 2016-07-12T15:06:38.727

@KarlNapf How do you discriminate the different length words then? As far as I know you then need additional prefixes, don't you? – flawr – 2016-07-12T15:13:04.890

Not variable inside one compression, more like depending on the size of the square. If n>16 then use 5 bits fixed, if 8<n<=16 use 4 bits fixed and so on. – Karl Napf – 2016-07-12T15:44:09.037

Oh right this makes sense, thank you! – flawr – 2016-07-12T15:55:43.597

Why are you using ' instead of , for a thousands separator? – mbomb007 – 2016-07-12T16:51:08.660

3For the same reason you're doing it the other way around, it's probobably the way you're used to. =) – flawr – 2016-07-12T17:32:19.810

7

Python 3, 10772 bits (1346.5 bytes)

def compress(rows):
    columns = list(zip(*rows))
    size = len(rows)
    symbols = range(size)
    output = size - 1
    weight = 25
    for i in symbols:
        for j in symbols:
            choices = set(rows[i][j:]) & set(columns[j][i:])
            output += weight * sorted(choices).index(rows[i][j])
            weight *= len(choices)
    return bin(output + 1)[3:]

def decompress(bitstring):
    number = int('1' + bitstring, 2) - 1
    number, size = divmod(number, 25)
    size += 1
    symbols = range(size)
    rows = [[None] * size for _ in symbols]
    columns = [list(column) for column in zip(*rows)]
    for i in symbols:
        for j in symbols:
            choices = set(symbols) - set(rows[i]) - set(columns[j])
            number, index = divmod(number, len(choices))
            rows[i][j] = columns[j][i] = sorted(choices)[index]
    return rows

Takes 0.1 seconds to compress and decompress the combined test cases.

Verify the score on Ideone.

Dennis

Posted 2016-07-12T12:43:26.247

Reputation: 196 637

Woah, care to explain? – Nathan Merrill – 2016-07-12T17:54:18.263

1In a nutshell, the compressor travels through the square in reading order, keeping track of the symbols that already appeared in that row and column, and arithmetically encoding the index of the symbol in the ascending list of possible symbols. I'll add a detailed explanation after cleaning my code up a bit and testing if bijective base 256 saves any bytes. – Dennis – 2016-07-12T17:57:41.473

I'm not completety sure what your code is doing, but isn't it possible to leave the last line out and solve it while decompressing? – Yytsi – 2016-07-12T18:12:09.260

@TuukkaX When there's only one possible symbol len(possible) is 1 and possible.index(rows[i][j]) is 0, so that symbol is encoded at no cost. – Dennis – 2016-07-12T18:36:15.353

Yay, the new test cases saved 6 bits. :) – Dennis – 2016-07-12T20:43:28.910

@Dennis I understand how you are coming up with the index in the possible symbols. However, I'm not sure what your "arithmetic encoder" is doing. My naive way would simply to store the numbers into prime powers (2^A3^B5^C...), but I'm guessing yours is smaller. – Nathan Merrill – 2016-07-13T13:20:50.440

3

J, 2444 bytes

Relies on the builtin A. to convert to and from a permutation of integers [0, n) and a permutation index.

Compress, 36 bytes

({:a.)joinstring<@(a.{~255&#.inv)@A.

The input is a 2d array representing the Latin square. Each row is converted to a permutation index, and that index is converted to a list of base 255 digits and replaced with an ASCII value. Each string is then joined using the ASCII character at 255.

Decompress, 45 bytes

[:(A.i.@#)[:(_&,(255&#.&x:);._1~1,255&=)u:inv

Splits the input string at each ASCII value of 255, and parse each group as base 255 digits. Then using the number of groups, create a list of integers [0, length) and permute it according to each index and return it as a 2d array.

miles

Posted 2016-07-12T12:43:26.247

Reputation: 15 654

2

Python, 6052 4521 3556 bytes

compress takes the square as a multiline string, just like the examples and returns a binary string, whereas decompress does the opposite.

import bz2
import math

def compress(L):
 if L=="0": 
  C = []
 else:
  #split elements
  elems=[l.split(',') for l in L.split('\n')]
  n=len(elems)
  #remove last row and col
  cropd=[e[:-1] for e in elems][:-1]
  C = [int(c) for d in cropd for c in d]

 #turn to string
 B=map(chr,C)
 B=''.join(B)

 #compress if needed
 if len(B) > 36:
  BZ=bz2.BZ2Compressor(9)
  BZ.compress(B)
  B=BZ.flush()

 return B

def decompress(C):

 #decompress if needed
 if len(C) > 40:
  BZ=bz2.BZ2Decompressor()
  C=BZ.decompress(C)

 #return to int and determine length
 C = map(ord,C)
 n = int(math.sqrt(len(C)))
 if n==0: return "0"

 #reshape to list of lists
 elems = [C[i:i+n] for i in xrange(0, len(C), n)]

 #determine target length
 n = len(elems[0])+1
 L = []
 #restore last column
 for i in xrange(n-1):
  S = {j for j in range(n)}
  L.append([])
  for e in elems[i]:
   L[i].append(e)
   S.remove(e)
  L[i].append(S.pop())
 #restore last row
 L.append([])
 for col in xrange(n):
  S = {j for j in range(n)}
  for row in xrange(n-1):
   S.remove(L[row][col])
  L[-1].append(S.pop())
 #merge elements
 orig='\n'.join([','.join([str(e) for e in l]) for l in L])
 return orig

Remove the last row+column and zip the rest.

  • Edit1: well base64 does not seem necessary
  • Edit2: now converting the chopped table into a binary string and compress only if necessary

Karl Napf

Posted 2016-07-12T12:43:26.247

Reputation: 4 131

2

Python3 - 3,572 3,581 bytes

from itertools import *
from math import *

def int2base(x,b,alphabet='0123456789abcdefghijklmnopqrstuvwxyz'):
    if isinstance(x,complex):
        return (int2base(x.real,b,alphabet) , int2base(x.imag,b,alphabet))
    if x<=0:
        if x==0:return alphabet[0]
        else:return  '-' + int2base(-x,b,alphabet)
    rets=''
    while x>0:
        x,idx = divmod(x,b)
        rets = alphabet[idx] + rets
    return rets

def lexicographic_index(p):
    result = 0
    for j in range(len(p)):
        k = sum(1 for i in p[j + 1:] if i < p[j])
        result += k * factorial(len(p) - j - 1)
    return result

def getPermutationByindex(sequence, index):
    S = list(sequence)
    permutation = []
    while S != []:
        f = factorial(len(S) - 1)
        i = int(floor(index / f))
        x = S[i]
        index %= f
        permutation.append(x)
        del S[i]
    return tuple(permutation)

alphabet = "abcdefghijklmnopqrstuvwxyz"

def dataCompress(lst):
    n = len(lst[0])

    output = alphabet[n-1]+"|"

    for line in lst:
        output += "%s|" % int2base(lexicographic_index(line), 36)

    return output[:len(output) - 1]

def dataDeCompress(data):
    indexes = data.split("|")
    n = alphabet.index(indexes[0]) + 1
    del indexes[0]

    lst = []

    for index in indexes:
        if index != '':
            lst.append(getPermutationByindex(range(n), int(index, 36)))

    return lst

dataCompress takes a list of integer tuples and returns a string.

dateDeCompress takes a string and returns a list of integer tuples.

In short, for each line, this program takes that lines permutation index and saves it in base 36. Decompressing takes a long time with big inputs but compression is really fast even on big inputs.

Usage:

dataCompress([(2,0,1),(1,2,0),(0,1,2)])

result: c|4|3|0

dataDeCompress("c|4|3|0")

result: [(2, 0, 1), (1, 2, 0), (0, 1, 2)]

Yytsi

Posted 2016-07-12T12:43:26.247

Reputation: 3 582

2You'd probably get a lot better runtime if you didn't wrap your permutations calls in list calls - permutations returns a generator, which lazily generates all the permutations, but if you try to make it into a list, it eagerly generates all the permutations, which takes a very long time. – Mego – 2016-07-12T15:27:26.173

Could you explain a bit better how to use your code? – Mego – 2016-07-12T15:32:12.010

@Mego Sure, maybe I'll implement the lazy evaluation also, although it's still fairly uncomputable. – Yytsi – 2016-07-12T15:33:25.840

@Mego Wow! I can finally calculate my score with the method you provided, thanks! :D – Yytsi – 2016-07-12T15:48:06.137

@TuukkaX have you successfully decompressed the latin square of size 25? – Nathan Merrill – 2016-07-12T17:36:12.437

@NathanMerrill I haven't, no :/ But I figured that this still competes, since I have ran my program to obtain my score: "You must actually run your program to obtain your score...". – Yytsi – 2016-07-12T18:14:02.423

You have to run both parts, sorry. I'll make it clearer. – Nathan Merrill – 2016-07-12T18:19:44.637

Why is this non-competing? – Dennis – 2016-07-12T20:48:33.620

@Dennis OP edited the question, so that I must also run the decompressor. Sadly, my decompressor is so slow, that I think I'm not able to run it in a reasonable time. I might just start the decompression algorithm, and if it finishes, this becomes competing again. – Yytsi – 2016-07-12T21:24:52.437

@Dennis Nevermind, memory runs out. Only way I'd get this solution to compete again would to make a workaround for the decompression, or if someone with a good computer would run the last testcase :D – Yytsi – 2016-07-12T21:55:14.453

I tried running your code on my machine, but something seems to have messed up the indentation. At least rets='' and the following loop are out of place. – Dennis – 2016-07-12T23:02:06.273

@Dennis Fixed. I have to manually cut & paste tabulations to each line, since stack exchange formats my code wrong :/ – Yytsi – 2016-07-12T23:14:44.373

Thought that I would save bytes by taking the permutation index of the whole board, appears that the last test case (whole board) permutation index is...1478 bytes with base 10 representation. The last test case x4 takes 7411 bytes. – Yytsi – 2016-07-12T23:25:54.507

@Mego Thanks again! Back in the competition :D – Yytsi – 2016-07-13T11:41:10.800

2

Python 3, 1955 bytes

Yet another one that uses permutation indices...

from math import factorial

test_data_name = 'latin_squares.txt'

def grid_reader(fname):
    ''' Read CSV number grids; grids are separated by empty lines '''
    grid = []
    with open(fname) as f:
        for line in f:
            line = line.strip()
            if line:
                grid.append([int(u) for u in line.split(',') if u])
            elif grid:
                yield grid
                grid = []
    if grid:
        yield grid

def show(grid):
    a = [','.join([str(u) for u in row]) for row in grid]
    print('\n'.join(a), end='\n\n')

def perm(seq, base, k):
    ''' Build kth ordered permutation of seq '''
    seq = seq[:]
    p = []
    for j in range(len(seq) - 1, 0, -1):
        q, k = divmod(k, base)
        p.append(seq.pop(q))
        base //= j
    p.append(seq[0])
    return p

def index(p):
    ''' Calculate index number of sequence p,
        which is a permutation of range(len(p))
    '''
    #Generate factorial base code
    fcode = [sum(u < v for u in p[i+1:]) for i, v in enumerate(p[:-1])]

    #Convert factorial base code to integer
    k, base = 0, 1
    for j, v in enumerate(reversed(fcode), 2):
        k += v * base
        base *= j
    return k

def encode_latin(grid):
    num = len(grid)
    fbase = factorial(num)

    #Encode grid rows by their permutation index,
    #in reverse order, starting from the 2nd-last row
    codenum = 0
    for row in grid[-2::-1]:
        codenum = codenum * fbase + index(row)
    return codenum

def decode_latin(num, codenum):
    seq = list(range(num))
    sbase = factorial(num - 1)
    fbase = sbase * num

    #Extract rows
    grid = []
    for i in range(num - 1):
        codenum, k = divmod(codenum, fbase)
        grid.append(perm(seq, sbase, k))

    #Build the last row from the missing element of each column
    allnums = set(seq)
    grid.append([allnums.difference(t).pop() for t in zip(*grid)])
    return grid

byteorder = 'little'

def compress(grid):
    num = len(grid)
    codenum = encode_latin(grid)
    length = -(-codenum.bit_length() // 8)
    numbytes = num.to_bytes(1, byteorder)
    codebytes = codenum.to_bytes(length, byteorder)
    return numbytes + codebytes

def decompress(codebytes):
    numbytes, codebytes= codebytes[:1], codebytes[1:]
    num = int.from_bytes(numbytes, byteorder)
    if num == 1:
        return [[0]]
    else:
        codenum = int.from_bytes(codebytes, byteorder)
        return decode_latin(num, codenum)

total = 0
for i, grid in enumerate(grid_reader(test_data_name), 1):
    #show(grid)
    codebytes = compress(grid)
    length = len(codebytes)
    total += length
    newgrid = decompress(codebytes)
    ok = newgrid == grid
    print('{:>2}: Length = {:>3}, {}'.format(i, length, ok))
    #print('Code:', codebytes)
    #show(newgrid)

print('Total bytes: {}'.format(total))

output

 1: Length =   1, True
 2: Length =   1, True
 3: Length =   2, True
 4: Length =   3, True
 5: Length =   5, True
 6: Length =   7, True
 7: Length =  11, True
 8: Length =  14, True
 9: Length =  20, True
10: Length =  26, True
11: Length =  33, True
12: Length =  41, True
13: Length =  50, True
14: Length =  61, True
15: Length =  72, True
16: Length =  84, True
17: Length =  98, True
18: Length = 113, True
19: Length = 129, True
20: Length = 147, True
21: Length = 165, True
22: Length = 185, True
23: Length = 206, True
24: Length = 229, True
25: Length = 252, True
Total bytes: 1955

PM 2Ring

Posted 2016-07-12T12:43:26.247

Reputation: 469

1

Java, 2310 bytes

We convert each row of the square to a number representing which lexicographic permutation it is using factoradic numbers, also known as a factorial number system, which is useful for numbering permutations.

We write the square to a binary file where the first byte is the size of the square, and then each row has one byte for the number of bytes in the binary representation of a Java BigInteger, followed by the bytes of that BigInteger.

To reverse the process and decompress the square we read the size back and then each BigInteger, and use that number to generate each row of the square.

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.math.BigInteger;
import java.util.Arrays;
import java.util.List;
import java.util.stream.Collectors;

public class Latin {
    public static void main(String[] args) {
        if (args.length != 3) {
            System.out.println("java Latin {-c | -d} infile outfile");
        } else if (args[0].equals("-c")) {
            compress(args[1], args[2]);
        } else if (args[0].equals("-d")) {
            decompress(args[1], args[2]);
        } else {
            throw new IllegalArgumentException(
                "Invalid mode: " + args[0] + ", not -c or -d");
        }
    }

    public static void compress(String filename, String outname) {
        try (BufferedReader br = Files.newBufferedReader(Paths.get(filename))) {
            try (OutputStream os =
                    new BufferedOutputStream(new FileOutputStream(outname))) {
                String line = br.readLine();
                if (line == null) return;
                int size = line.split(",").length;
                if (size > 127) throw new ArithmeticException(
                    "Overflow: square too large");
                Permutor perm = new Permutor(size);
                os.write((byte) size); // write size of square

                do {
                    List<Integer> nums = Arrays.stream(line.split(","))
                        .map(Integer::new)
                        .collect(Collectors.toList());
                    byte[] bits = perm.which(nums).toByteArray();
                    os.write((byte) bits.length); // write length of bigint
                    os.write(bits); // write bits of bigint
                } while ((line = br.readLine()) != null);
            }
        } catch (IOException e) {
            System.out.println("Error compressing " + filename);
            e.printStackTrace();
        }
    }

    public static void decompress(String filename, String outname) {
        try (BufferedInputStream is =
                new BufferedInputStream(new FileInputStream(filename))) {
            try (BufferedWriter bw =
                    Files.newBufferedWriter(Paths.get(outname))) {
                int size = is.read(); // size of latin square
                Permutor perm = new Permutor(size);
                for (int i = 0; i < size; ++i) {
                    int num = is.read(); // number of bytes in bigint
                    if (num == -1) {
                        throw new IOException(
                            "Unexpected end of file reading " + filename);
                    }
                    byte[] buf = new byte[num];
                    int read = is.read(buf); // read bits of bigint into buf
                    if (read != num) {
                        throw new IOException(
                            "Unexpected end of file reading " + filename);
                    }
                    String row = perm.nth(new BigInteger(buf)).stream()
                        .map(Object::toString)
                        .collect(Collectors.joining(","));
                    bw.write(row);
                    bw.newLine();
                }
            }
        } catch (IOException e) {
            System.out.println("Error reading " + filename);
            e.printStackTrace();
        }
    }
}

Permutor is adapted from a class I wrote a few years ago to work with permutations:

import java.util.List;
import java.util.Arrays;
import java.util.ArrayList;
import java.math.BigInteger;
import static java.math.BigInteger.ZERO;
import static java.math.BigInteger.ONE;

public class Permutor {
    private final List<Integer> items;

    public Permutor(int n) {
        items = new ArrayList<>();
        for (int i = 0; i < n; ++i) items.add(i);
    }

    public BigInteger size() {
        return factorial(items.size());
    }

    private BigInteger factorial(int x) {
        BigInteger f = ONE;
        for (int i = 2; i <= x; ++i) {
            f = f.multiply(BigInteger.valueOf(i));
        }
        return f;
    }

    public List<Integer> nth(long n) {
        return nth(BigInteger.valueOf(n));
    }

    public List<Integer> nth(BigInteger n) {
        if (n.compareTo(size()) > 0) {
            throw new IllegalArgumentException("too high");
        }
        n = n.subtract(ONE);
        List<Integer> perm = new ArrayList<>(items);
        int offset = 0, size = perm.size() - 1;
        while (n.compareTo(ZERO) > 0) {
            BigInteger fact = factorial(size);
            BigInteger mult = n.divide(fact);
            n = n.subtract(mult.multiply(fact));
            int pos = mult.intValue();
            Integer t = perm.get(offset + pos);
            perm.remove((int) (offset + pos));
            perm.add(offset, t);
            --size;
            ++offset;
        }
        return perm;
    }

    public BigInteger which(List<Integer> perm) {
        BigInteger n = ONE;
        List<Integer> copy = new ArrayList<>(items);
        int size = copy.size() - 1;
        for (Integer t : perm) {
            int pos = copy.indexOf(t);
            if (pos < 0) throw new IllegalArgumentException("invalid");
            n = n.add(factorial(size).multiply(BigInteger.valueOf(pos)));
            copy.remove((int) pos);
            --size;
        }
        return n;
    }
}

Usage:

With a Latin square in latin.txt, compress it:

java Latin -c latin.txt latin.compressed

And decompress it:

java Latin -d latin.compressed latin.decompressed

David Conrad

Posted 2016-07-12T12:43:26.247

Reputation: 1 037