Challenge

Design a compression algorithm specialized for compressing ASCII mazes. You will need to create both a compression algorithm and a decompression algorithm. Your score will be based on the size of your compressed mazes.

Mazes

These mazes are made primarily of the characters (floors), +, -, |, and # (walls), and exactly one each of ^ (start) and $ (end). They may also contain ASCII letters, which count as floor tiles. For the purposes of this challenge, mazes do not have to be solvable and the actual meaning of maze contents is irrelevant.

+ will be used for wall cells where there is at least one horizontally adjacent wall cell and at least one vertically adjacent wall cell.
| will be used for wall cells where there is at least one vertically adjacent wall cell, but no horizontally adjacent wall cells.
- will be used for wall cells where there is at least one horizontally adjacent wall cell, but no vertically adjacent wall cells
# will only be used for wall cells that are not orthogonally adjacent to other wall cells.

All mazes are rectangular, but do not necessarily have a regular grid/wall alignment.

Mazes To Compress

Maze 1

+----+----
|  o |    |
| -- | o--+
|    | |  $
 --^-+-+---

Maze 2

+-----+---+
|  a  |   |
^ +-+-+ # |
| | |  B  |
| | | --+ |
|   c   | $
+-------+--

Maze 3

----------+-+-+-----+-+
^         | | |     | |
+-- --+R #  | |p| | | |
|     | |       | |   |
+---+ +-+-+-- +-+ | | |
|  m| | | |   |   | | |
| +-+ | | | | | --+ | |
| | |    h  | |   | | |
| | | | | |  #  --+-+ |
|     | | | | |  S|   $
+-----+-+-+-+-+---+----

Maze 4

+-----+---+-+---+-------^-----+
|     |x  | |   |     tsrq    |
+-+-- +-- | +--  #  --+---- --+
| |   |           |   |       |
| | | | | +-+-+---+ | +-- | +-+
| | | u | | | |     | |   | | |
| +-+ | | | | +---- +-+---+ | |
| |   | |   |    y  |       w |
| | --+ | --+ +-- | +---- | | |
|     | |   | |   | |     | | |
+-- --+ +-+ | | | | +-- | +-+-+
|     | | |   | | | |   |     |
$ | --+-+ | --+-+ | +-+-+-- --+
| |   |      z|   |   |    v  |
+-+---+-------+---+---+-------+

Maze 5

++ -----------+
++-       Beep|
$  ----+---+--+
+-+boop|   |  |
| +--- | | | ++
|      | |  +++
+------+-+--+ ^

Maze 6

+-$---------------+-+--
|                 | |j 
| |l ---- # ---+ |  |  
| | |       m  | +--+ |
| | | +-+---- #       |
| | | | |      +----+ |
|o| | | | +----+    | |
|       | |    | -- | |
| | | | | | -+ |    | |
| | | | |  | | +--- | |
| | | | +- | | |   | ++
+-+ |n| |  | ++ +--+ | 
    | |   -+- | |  | +-
+---+ +---    |  | |  ^
|    |     --+ --+ | | 
| -- | |  k  |     | ++
|    | |      +--- | ++
|    |      | |    |  |
+-- -+----  | +----+--+

Maze 7

+---+-+-------------+-+^+-----+-------+---+-+---+-+---+-+---+
|   |c|             | | |  c  |       |   | |   | |   |c|   |
+-- | | +-- +-- # | | | +-- --+ +---- +-- | +-+ | | +-+ | --+
|       |   |     | |           |         |   | |c| |       |
| | +-- | +-+-- +-+ +-- # +- # -+-- +-- | | --+ | | | | --+C|
|c| |   | | c   |         |         |c  |             |   | |
+-+-+---+-+-----+---------+---------+---+-------------+---+$|

Maze 8

------+-+-+---+-+---+-----------+---+-----+---------------+-+
^     | | |   | |   |           |   |     |      r        | |
+-- | | | t | | +-- +----- # ---+-- +-- --+-- ----+-+ --+ | |
|   |   | | |   |   |         r |   |             | |   |   |
| | | | | +-+ --+ --+-- --------+-- | ----+ --+ | | | --+ | |
| |r| |            rotation               |   | |   |   | | $
+-+-+-+-----------------------------------+---+-+---+---+-+--

Maze 9

|$|^--+-+---+-----+-+---+-+-+---+---+-+---+-----+
| |   | |   |     | |   | | | f |   | |   |     |
| +-+ | | # +-+ --+ +-+ | | | # | +-+ +-- | ----+
|   |       | |    f| |           | | |   |   f |
| |F+-+ | | | | +---+ | | | ----+-+ | | --+ --+-+
| |   | | |     |     | | |   f |   |         | |
| | | | +-+-+---+-- | | | +-+-+-+ +-+ +--- # -+ |
| | | |     |   |   |   | | | |   | | |         |
+-+-+ | +---+ --+ | +---+-+ | | --+ f | | | | --+
|     | |         |                 | | | | |   |
| --+f| | | +-- --+--f--+ --+ | ----+ | +-+ +---+
|   |     | |     |     |   | |           |     |
+---+-----+-+-----+-----+---+-+-----------+-----+

Maze 10

+-----+-+-----------+
|  q  | |         q |
|Q+-+ | +-+-+-+---- |
$ | |     | | |  q  |
+-+ | | | | | +-- +-+
| |   | |     |   | |
| +-- +-+ |q| +-+ | |
|    q|   | |   |   |
| | | +-- | +-+ | --+
| | | |   | | |     |
+-+-+-+ +-+-+ +-- | |
|       |         | |
+--- # -+ | | +-- | |
|  q      | | |   | ^
+-+ +-- | | +-+ | +-+
| | |   | |q|   |   |
| +-+-+ | +-+-- | | |
|     | | |     | | |
| | | +-+-+-- +-+ +-+
| | |         | q   |
+-+-+---------+-----+

Rules, Assumptions, Scoring

Standard loopholes are banned
- Write a general program, not one that only works for the ten test cases. It must be able to handle any arbitrary maze.
You may assume there will be exactly one entrance and one exit. Entrances and exits will always be on the border of the maze.
You may assume that all inputs use walls that follow the rules enumerated above. Your compression algorithm does not have to work for mazes containing walls that violate those rules.
Input mazes may or may not be solvable.
You may assume the maze will be no larger than 100 characters in either direction.
You may assume that letters will not appear on the edge of the maze. (since this is the case for the examples provided)
Your score is the total size, in bytes (octets), of all of the compressed mazes.
- You may use hex, base64, binary strings, or any similar format as a representation for your compressed maze if you find that more convenient. You should still count the result in whole octets, rounded up for each maze (e.g. 4 base64 digits is 3 bytes, 2 hex digits is 1 byte, 8 binary digits is 1 byte, etc...)
- Lowest score wins!

Beefster

Posted 2019-03-15T19:19:52.880

Reputation: 6 651

Is there a size limit to a maze? – Embodiment of Ignorance – 2019-03-15T20:33:40.643

@EmbodimentofIgnorance 100x100 – Beefster – 2019-03-15T20:47:51.290

@Arnauld actually that was a copy-pasting issue, but I think SE formatting strips spaces at the end of the line anyway. Yes, it's supposed to be space-padded. – Beefster – 2019-03-16T20:56:50.467

@ChasBrown, that counts as a standard loophole, which means it's banned by default. – Beefster – 2019-03-16T20:57:38.233

I think it's rather sad that mazes aren't guaranteed to have exactly one solution. I'd love to see someone find a way to take advantage of such a restriction. – dfeuer – 2019-03-16T21:07:01.123

@Arnauld good catch. corrected. – Beefster – 2019-03-17T19:55:33.970

Is it valid to assume that there are no ASCII letters on the border of the maze? – schnaader – 2019-03-26T10:42:12.473

1@schnaader, that seems reasonable given the example test cases. – Beefster – 2019-03-26T17:37:23.243

Answers

JavaScript (Node.js), score = 586 541 503 492 479 bytes

The walls are stored as a Huffman-encoded stream of bits describing whether a prediction function is returning the correct guess or not.

The special characters are stored as \$(d, c)\$, where \$d\$ is the distance from the previous special character and \$c\$ is the ASCII code.

Try it online!

Common

const HUFFMAN = [
  '00',       // 0000
  '010',      // 0001
  '1001',     // 0010
  '11100',    // 0011
  '011',      // 0100
  '101',      // 0101
  '11110',    // 0110
  '100010',   // 0111
  '110',      // 1000
  '11101',    // 1001
  '1111100',  // 1010
  '1111101',  // 1011
  '10000',    // 1100
  '1111110',  // 1101
  '100011',   // 1110
  '1111111'   // 1111
];

let bin = (n, w) => n.toString(2).padStart(w, '0');

let wallShape = (row, x, y) => {
  let vWall = (row[y - 1] || [])[x] | (row[y + 1] || [])[x],
      hWall = row[y][x - 1] | row[y][x + 1];

  return ' -|+'[row[y][x] ? vWall * 2 | hWall : 0];
}

let predictWall = (row, x, y, w, h) => {
  let prvRow = row[y - 1] || [];
  return !x | !y | x == w - 1 | y == h - 1 | (prvRow[x] | row[y][x - 1]) & !prvRow[x - 1];
}

Compression

let pack = str => {
  let row = str.split('\n').map(r => [...r]),
      w = row[0].length,
      h = row.length;

  let wall = row.map((r, y) => r.map((c, x) => +/[-+|]/.test(c)));

  if(row.some((r, y) => r.some((c, x) => wall[y][x] && wallShape(wall, x, y) != c))) {
    throw "invalid maze";
  }

  row = wall.map((r, y) => r.map((v, x) => predictWall(wall, x, y, w, h) ^ v));
  row = row.map(r => r.join('')).join('');
  row = row.replace(/.{1,4}/g, s => HUFFMAN[parseInt(s.padEnd(4, '0'), 2)]);

  str =
    str.replace(/[\n|+-]/g, '').replace(/ *(\S)/g, (s, c) => {
      let n = c.charCodeAt(),
          i = '^$#'.indexOf(c);

      return (
        bin(s.length > 63 ? 0xFC000 | s.length - 1 : s.length - 1, 6) +
        bin(~i ? i : n < 91 ? (n > 80 ? 0x1F0 : 0x1E0) | ~-n & 15 : n - 94, 5)
      );
    }).trim();

  return (
    Buffer.from(
      (bin(w, 7) + bin(h, 7) + row + str)
      .match(/.{1,8}/g).map(s => parseInt(s.padEnd(8, '0'), 2))
    ).toString('binary')
  );
}

Decompression

let unpack = str => {
  str = [...str].map(c => bin(c.charCodeAt(), 8)).join('');

  let x, y, n, i, s,
      ptr = 0,
      read = n => parseInt(str.slice(ptr, ptr += n), 2),
      w = read(7),
      h = read(7),
      row = [];

  for(x = s = ''; s.length < w * h;) {
    ~(i = HUFFMAN.indexOf(x += read(1))) && (s += bin(i, 4), x = '');
  }
  for(i = y = 0; y < h; y++) {
    for(row[y] = [], x = 0; x < w; x++) {
      row[y][x] = predictWall(row, x, y, w, h) ^ s[i++];
    }
  }

  row = row.map((r, y) => r.map((c, x) => wallShape(row, x, y)));

  for(i = 0; str[ptr + 10];) {
    for(
      n = (n = read(6)) == 0x3F ? read(14) + 1 : n + 1;
      n -= row[i / w | 0][i % w] == ' ';
      i++
    ) {}

    row[i / w | 0][i % w] = String.fromCharCode(
      (n = read(5)) >= 0x1E ? read(4) + (n == 0x1F ? 81 : 65) : [94, 36, 35][n] || n + 94
    );
  }
  return row.map(r => r.join('')).join('\n');
}

How?

A maze is encoded as a bit stream which is eventually converted to a string.

Header

The header consists of:

the width \$w\$ on 7 bits
the height \$h\$ on 7 bits

Wall data

We walk through the entire maze and attempt to predict if the next cell is a wall or not, based on the previously encountered cells. We emit a \$0\$ if we are correct, or a \$1\$ if we are wrong.

This results in a sequence of correction bits with (hopefully) significantly more \$0\$'s than \$1\$'s. This sequence is split into nibbles and stored using hard-coded Huffman codes:

00 → 0000
010 → 0001
1001 → 0010
11100 → 0011
011 → 0100
etc.

To decode the wall \$W_n\$, the decompression routine computes the same prediction \$P_n\$ and toggles the result if needed, using the correction bit \$C_n\$:

$$W_n=P_n\oplus C_n$$

The final wall shapes are deduced in a way that is similar to Nick Kennedy's answer.

Special characters

Each special characters is encoded as:

The distance minus \$1\$ from the last special character (ignoring the walls):
- on 6 bits if it's less than \$63\$
- or as \$111111\$ + 14 bits otherwise (never used in the test cases, but required in theory)
The code of the character:
- on 5 bits if it's ^, $, # or [a-z]
- or \$11110\$ + 4 bits for [A-O]
- or \$11111\$ + 4 bits for [P-Z]

Arnauld

Posted 2019-03-15T19:19:52.880

Reputation: 111 334

Have you tried compression algorithms other than deflate? There are an awful lot on the shelf! – dfeuer – 2019-03-16T21:08:38.580

There's no rule that says it has to work in TIO! – dfeuer – 2019-03-16T21:24:44.380

O_o nice, wonder if decimal compression would help at all (basically the opposite of huffman, the space is 0 to 1, divided up into sections with arbitrary size (<1 of course), and the encoding is the shortest binary number that falls within the correct slice of the space – ASCII-only – 2019-03-18T10:20:16.133

@ASCII-only Decimal coding (aka arithmetic coding) definitely should improve the compression ratio, but probably by a small margin on such a short data stream. I'm sure it's possible to improve the Huffman coding and/or the prediction function before switching to arithmetic coding, though (both of them being really basic right now). – Arnauld – 2019-03-18T11:13:32.930

@ASCII-only For instance, I should probably try longer codes (using nibbles is arbitrary). I could also add a 1-bit flag in the header telling if the data should be unpacked with the default static Huffman codes or with dynamic codes (if it turns out to improve the compression of some mazes). One thing that I did try was to rotate the maze by 90° and see if was compressing better. But that was just saving 1 byte overall. – Arnauld – 2019-03-18T11:21:45.547

R, score 668 bytes

This takes advantage of the fact that the wall character is determined by its surroundings. As such, wall characters can be encoded as bits. The remaining info that needs to be stored are the dimensions of the maze, the positions of the start and finish and the positions of any other non-wall characters. Since the non-wall characters are ASCII, I've used the most significant bit of each byte to indicate whether there is another character that follows so that some of the words in the mazes don't have to have the location of each character stored separately. Note also that for mazes less than or equal to 256 characters (e.g. up to 16x16 or equivalent rectangular mazes), the positions can be stored in one byte, whereas for larger mazes the positions need two bytes.

Utility functions

r <- as.raw

int_as_raw <- function(int, bytes = 2) {
  if (bytes == 1) {
    r(int)
  } else {
    do.call(c, lapply(int, function(.x) r(c(.x %/% 256, .x %% 256))))
  }
}

raw_as_int <- function(raw, bytes = 2) {
  if (bytes == 1) {
    as.integer(raw)
  } else {
    sapply(
      seq(1, length(raw) - 1, 2),
      function(.x) as.integer(as.integer(raw[.x + 0:1]) %*% c(256, 1))
    )
  }
}

Compression algorithm

compress_maze <- function(maze) {
  maze_array <- do.call(rbind, strsplit(maze, ""))
  simple_maze <- r(maze_array %in% c("+", "#", "-", "|"))
  simple_maze <- packBits(c(simple_maze, rep(r(0), (8 - length(simple_maze)) %% 8)))
  maze_dim <- int_as_raw(dim(maze_array), 1)
  bytes_needed <- 1 + (length(maze_array) > 256)
  start_finish <- int_as_raw(sapply(c("^", "$"), function(.x) which(maze_array == .x)) - 1, bytes = bytes_needed)
  other_ascii_locs_rle <- rle(!(maze_array %in% c(" ", "+", "#", "-", "|", "$", "^")))
  other_ascii_locs <- cumsum(
    c(1, other_ascii_locs_rle$lengths[-length(other_ascii_locs_rle$lengths)])
  )[other_ascii_locs_rle$values]
  other_ascii_locs_length <- other_ascii_locs_rle$lengths[other_ascii_locs_rle$values]

  encode_ascii <- function(loc, len) {
    text <- charToRaw(paste(maze_array[loc:(loc + len - 1)], collapse = ""))
    if (len > 1) {
      text[1:(len - 1)] <- text[1:(len - 1)] | r(128)
    }
    c(int_as_raw(loc - 1, bytes = bytes_needed), text)
  }

  other_ascii_encoded <- Map(encode_ascii,
    other_ascii_locs,
    other_ascii_locs_length
    )
  other_ascii_encoded <- do.call(c, other_ascii_encoded)
  c(maze_dim, simple_maze, start_finish, other_ascii_encoded)
}

Decompression algorithm

decompress_maze <- function(c_maze) {
  dim_maze <- as.integer(c_maze[1:2])
  len_maze <- prod(dim_maze)
  len_maze_b <- ceiling(len_maze / 8)
  bit_maze <- rawToBits(c_maze[-(1:2)])[1:len_maze]
  dim(bit_maze) <- dim_maze
  bit_maze[-1, ] <- bit_maze[-1, ] | rawShift(bit_maze[-nrow(bit_maze), ] & r(1), 1)
  bit_maze[-nrow(bit_maze), ] <- bit_maze[-nrow(bit_maze), ] | rawShift(bit_maze[-1, ] & r(1), 1)
  bit_maze[, -1] <- bit_maze[, -1] | rawShift(bit_maze[, -ncol(bit_maze)] & r(1), 2)
  bit_maze[, -ncol(bit_maze)] <- bit_maze[, -ncol(bit_maze)] | rawShift(bit_maze[, -1] & r(1), 2)
  bit_maze[(bit_maze & r(1)) == r(0)] <- r(0)
  array_maze <- c(" ", "#", "|", "-", "+")[(as.integer(bit_maze) + 1) %/% 2 + 1]
  dim(array_maze) <- dim_maze
  bytes_needed <- 1 + (len_maze > 256)
  start_finish <- raw_as_int(c_maze[2 + len_maze_b + 1:(bytes_needed * 2)], bytes_needed) + 1
  array_maze[start_finish] <- c("^", "$")
  i <- 3 + len_maze_b + 2 * bytes_needed
  while (i < length(c_maze)) {
    loc <- raw_as_int(c_maze[i + 1:bytes_needed - 1], bytes_needed) + 1
    i <- i + bytes_needed
    text <- character(0)
    while (c_maze[i] & r(128)) {
      text <- c(text, rawToChar(c_maze[i] & r(127)))
      i <- i + 1
    }
    text <- c(text, rawToChar(c_maze[i]))
    array_maze[loc:(loc + length(text) - 1)] <- text
    i <- i + 1
  }
  apply(array_maze, 1, paste, collapse = "")
}

Try it online!

Nick Kennedy

Posted 2019-03-15T19:19:52.880

Reputation: 11 829

I knew you'd be able to store the walls as bits but I like your approach for compressing the non-wall character position data. +1 – Neil – 2019-03-17T12:07:04.013

ASCII Maze Compression

Challenge

Mazes

Mazes To Compress

Rules, Assumptions, Scoring

Answers

JavaScript (Node.js), score = 586 541 503 492 479 bytes

Common

Compression

Decompression

How?

Header

Wall data

Special characters

R, score 668 bytes

Utility functions

Compression algorithm

Decompression algorithm