Python, 1281.375 1268.625 bytes
We encode the Latin square one “decision” at a time, where each decision is of one of these three forms:
- which number goes in row i, column j;
- in row i, which column the number k goes in;
- in column j, which row the number k goes in.
At each step, we make all the logical inferences we can based on previous decisions, then pick the decision with the smallest number of possible choices, which therefore take the smallest number of bits to represent.
The choices are provided by a simple arithmetic decoder (div/mod by the number of choices). But that leaves some redundancy in the encoding: if k decodes to a square where the product of all the numbers of choices was m, then k + m, k + 2⋅m, k + 3⋅m, … decode to the same square with some leftover state at the end.
We take advantage of this redundancy to avoid explicitly encoding the size of the square. The decompressor starts by trying to decode a square of size 1. Whenever the decoder finishes with leftover state, it throws out that result, subtracts m from the original number, increases the size by 1, and tries again.
import numpy as np
class Latin(object):
def __init__(self, size):
self.size = size
self.possible = np.full((size, size, size), True, dtype=bool)
self.count = np.full((3, size, size), size, dtype=int)
self.chosen = np.full((3, size, size), -1, dtype=int)
def decision(self):
axis, u, v = np.unravel_index(np.where(self.chosen == -1, self.count, self.size).argmin(), self.count.shape)
if self.chosen[axis, u, v] == -1:
ws, = np.rollaxis(self.possible, axis)[:, u, v].nonzero()
return axis, u, v, list(ws)
else:
return None, None, None, None
def choose(self, axis, u, v, w):
t = [u, v]
t[axis:axis] = [w]
i, j, k = t
assert self.possible[i, j, k]
assert self.chosen[0, j, k] == self.chosen[1, i, k] == self.chosen[2, i, j] == -1
self.count[1, :, k] -= self.possible[:, j, k]
self.count[2, :, j] -= self.possible[:, j, k]
self.count[0, :, k] -= self.possible[i, :, k]
self.count[2, i, :] -= self.possible[i, :, k]
self.count[0, j, :] -= self.possible[i, j, :]
self.count[1, i, :] -= self.possible[i, j, :]
self.count[0, j, k] = self.count[1, i, k] = self.count[2, i, j] = 1
self.possible[i, j, :] = self.possible[i, :, k] = self.possible[:, j, k] = False
self.possible[i, j, k] = True
self.chosen[0, j, k] = i
self.chosen[1, i, k] = j
self.chosen[2, i, j] = k
def encode_sized(size, square):
square = np.array(square, dtype=int)
latin = Latin(size)
chosen = np.array([np.argmax(square[:, :, np.newaxis] == np.arange(size)[np.newaxis, np.newaxis, :], axis=axis) for axis in range(3)])
num, denom = 0, 1
while True:
axis, u, v, ws = latin.decision()
if axis is None:
break
w = chosen[axis, u, v]
num += ws.index(w)*denom
denom *= len(ws)
latin.choose(axis, u, v, w)
return num
def decode_sized(size, num):
latin = Latin(size)
denom = 1
while True:
axis, u, v, ws = latin.decision()
if axis is None:
break
if not ws:
return None, 0
latin.choose(axis, u, v, ws[num % len(ws)])
num //= len(ws)
denom *= len(ws)
return latin.chosen[2].tolist(), denom
def compress(square):
size = len(square)
assert size > 0
num = encode_sized(size, square)
while size > 1:
size -= 1
square, denom = decode_sized(size, num)
num += denom
return '{:b}'.format(num + 1)[1:]
def decompress(bits):
num = int('1' + bits, 2) - 1
size = 1
while True:
square, denom = decode_sized(size, num)
num -= denom
if num < 0:
return square
size += 1
total = 0
with open('latin_squares.txt') as f:
while True:
square = [list(map(int, l.split(','))) for l in iter(lambda: next(f), '\n')]
if not square:
break
bits = compress(square)
assert set(bits) <= {'0', '1'}
assert square == decompress(bits)
print('Square {}: {} bits'.format(len(square), len(bits)))
total += len(bits)
print('Total: {} bits = {} bytes'.format(total, total/8.0))
Output:
Square 1: 0 bits
Square 2: 1 bits
Square 3: 3 bits
Square 4: 8 bits
Square 5: 12 bits
Square 6: 29 bits
Square 7: 43 bits
Square 8: 66 bits
Square 9: 94 bits
Square 10: 122 bits
Square 11: 153 bits
Square 12: 198 bits
Square 13: 250 bits
Square 14: 305 bits
Square 15: 363 bits
Square 16: 436 bits
Square 17: 506 bits
Square 18: 584 bits
Square 19: 674 bits
Square 20: 763 bits
Square 21: 877 bits
Square 22: 978 bits
Square 23: 1097 bits
Square 24: 1230 bits
Square 25: 1357 bits
Total: 10149 bits = 1268.625 bytes
"A Latin square is a square that has no repeated symbols in the rows or columns." I suppose there's also the requirement that only symbols
0
ton-1
are used? – Martin Ender – 2016-07-12T12:48:50.3301Well, by definition, any symbol could be used, but my test cases just happen to use
0
thoughn-1
:) – Nathan Merrill – 2016-07-12T12:51:26.5631
Similar: http://codegolf.stackexchange.com/questions/41523/sudoku-compression
– feersum – 2016-07-12T12:54:34.0933@NathanMerrill well, the point was only being allowed to use
n
different symbols. :P – Martin Ender – 2016-07-12T12:57:01.893Can multibyte characters be used? – DavidC – 2016-07-12T13:37:27.560
1@DavidC It shouldn't matter, as the size is measured in bytes. – flawr – 2016-07-12T13:39:38.353
Does the output have to be the same every time the program is ran on the same input, or can we use randomness? – Loovjo – 2016-07-12T14:21:15.077
@Loovjo randomness is fine. – Nathan Merrill – 2016-07-12T15:33:51.207
What's to stop you hard-coding the test cases and compressing them to 1 byte, total score 25 bytes, and fitting the "compressing/decompressing any square" by leaving others unchanged? – TessellatingHeckler – 2016-07-12T17:43:03.410
@TessellatingHeckler fixed. – Nathan Merrill – 2016-07-12T17:53:43.137
One of the answers encodes the output in bits. Is that allowed? – Dennis – 2016-07-12T19:48:41.750
@Dennis AFAIK, we allow for any encoding. It sounds like a meta question to me. – Nathan Merrill – 2016-07-12T19:50:03.210
The default for code golf is bytes, but since this isn't code golf, I'd say its up to you. – Dennis – 2016-07-12T19:51:40.003
I'm ok with bit encoding. If somebody manages to encode their answer in 1 less bit, more power to them. – Nathan Merrill – 2016-07-12T19:54:17.987
219 of your 25 test cases (all those except 4, 6, 8, 10, 12, 14) were generated by permuting the rows and columns of the trivial Latin square whose (i, j) entry is i + j mod n. This makes them very easy to compress much more than a random Latin square. Although your rules say we should have similar compression ratios for all Latin squares, this might be easy to break by accident. The test cases should be more representative. – Anders Kaseorg – 2016-07-12T19:55:10.233
It turns out that SageMath has a built in Latin square generator (documentation), because of course it does. Run it on SageMathCell.
– Anders Kaseorg – 2016-07-12T19:56:01.560@AndersKaseorg nice investigation. You are correct in your assumption of permutation (the site I used only did it via permutation at sizes higher than 15). I'll check out your link – Nathan Merrill – 2016-07-12T19:57:54.380
What does "within the maximum square size" mean? Up to size 25, the maximum of any of the test cases? (I assume so.) – David Conrad – 2016-07-13T04:58:47.463
@DavidConrad correct. – Nathan Merrill – 2016-07-13T05:41:17.053
FWIW, at the end of this XKCD forum post is some Python 2 code I wrote a few years ago that generates random Latin Squares. It's reasonably fast (for Python): it can do an order 60 square in under 30 seconds on my old 2GHz machine, which is considerably faster than an Exact Cover algorithm. It can display the grid in numeric form (with 1-based indices) or in alphabetic string form; it can also output the grid as a PPM image.
– PM 2Ring – 2016-07-13T11:40:20.050@PM2Ring An approach based on completing rectangles to squares almost certainly cannot yield a uniform distribution, because different rectangles can be completed to squares in different numbers of ways. See Generate Random Latin Squares. SageMath uses the Jacobson and Matthews (1996) algorithm.
– Anders Kaseorg – 2016-07-13T19:15:54.907@AndersKaseorg: Ah, ok. FWIW, I wasn't implying that my code gives a uniform distribution, just that the squares it makes are more random than those obtained by randomly permuting rows & columns (& symbols) of a trivial Latin Square. But thanks for that info, and thanks for looking at my code. – PM 2Ring – 2016-07-13T19:23:00.173