Smallest chess board compression

39

11

Write an algorithm or program that can encode and decode a chess board. The goal is to make the smallest representation of a chessboard that could be used (once decoded) to determine all movement possibilities for a player on that turn.

The encoding must be able to show:

  • Whose turn it is.
  • Whether the player can castle on each side.
  • Whether the player can perform an en-passant, and if so, which of their pawns?
  • Positions of all pieces.

Important note on castling: If white moves their king one turn, and then moves it back the next, it must be clear that they cannot castle on either side after that. Same would go if they moved their left or right rook. Although the board is visually in the same state as it was two turns ago, the game state has changed. More info here: http://en.wikipedia.org/wiki/Chess#Castling

Important note on en-passant: This is also a turn-sensitive move. Read the rules for more info. http://en.wikipedia.org/wiki/Chess#En_passant

Determine input and output as needed. Major props to whoever can compress it the most!

Your score is determined worst-case scenario - maximum possible size in bits. Make sure you show how you calculated it that number and what you accounted for. Shoot for the smallest worst-case!

Seltzer

Posted 2014-01-25T23:06:56.263

Reputation: 491

What do you mean by "bitwise"? – Peter Taylor – 2014-01-25T23:16:45.990

Is this smallest code or most-compressed? Most-compressed is more interesting. – Justin – 2014-01-26T00:15:29.007

Sorry for not clarifying. Bitwise in the sense that you can only compress it if you start representing it as bits, which will require some bitwise operation. Poor use on my part. Also most-compressed, not smallest code. – Seltzer – 2014-01-26T01:59:40.913

How are you planning to handle submissions with variable length? Rate by average size? Maximum size? – Kendall Frey – 2014-01-26T02:05:42.800

@KendallFrey My bad. I'd measure it by maximum size (in bits). Most good compressions will end up getting smaller as fewer pieces are on the board, and there can't be more than 32 total pieces (but if you end up doing a quasi-Huffman thing for pieces where pawn promotion could change things, you'll need to figure out the maximum number of pawns that could be promoted at any given time in a game. I don't have the wherewithal to say what that is, but I think it's possible (under ridiculous circumstances) to have all of them promoted by moving them into 4 single-file lines on each site. – Seltzer – 2014-01-26T02:21:58.880

2

@GeekWithALife Yes, it's possible to have 18 queens on the board at the same time. Follow this link and click the playback button for an example.

– r3mainer – 2014-01-26T09:19:23.283

You probably want to impose a sensible time limit to exclude solutions which work by enumerating all possible board positions via move sequences and then indexing them. The combinatorics to justify the claimed score could be quite interesting, particularly if it's less than 170, but start to get a bit off-topic for this site. – Peter Taylor – 2014-01-26T15:21:25.607

@PeterTaylor What would be a good time limit for something like this? I'm new around here. – Seltzer – 2014-01-26T18:27:54.607

A minute or two should be ample for most reasonable answers, I would say. – Peter Taylor – 2014-01-26T22:26:36.733

If an identical position is reached three times with it being the same player's turn, then the game is a draw. Thus, you need to encode how many times the position has been reached. – Carl Love – 2014-01-27T02:16:46.267

An idea which I might not investigate fully, so I throw it out for other people to consider: if all 32 men are on the board, there are 15^8 possible configurations of the pawns, and the remaining 16 men fit into the remaining 48 squares. An encoding for which all 32 men on the board is the worst case can probably be improved by having one bit to switch between 32 men vs fewer. – Peter Taylor – 2014-01-27T09:33:42.957

@PeterTaylor Keep in mind that each bishop can only ever be on one color, so you can easily save a few bits there. – AJMansfield – 2014-01-28T01:11:38.567

1@AJMansfield, that might be useful for boards with about 28 men, but I calculate that 117 bits is plenty for boards with all 32 men, which is about 50 bits less than the target. The complication is that once you go below 32 men, promotion can give a player more bishops. – Peter Taylor – 2014-01-28T07:39:58.580

@PeterTaylor why would anyone promote to anything other than a queen or knight? – AJMansfield – 2014-01-28T13:30:17.360

An interesting variant of this problem, inspired by some of the wording earlier in this one, is to write a program that does compression like this, but only has to compress it for use by some type of AI program, allowing parts that the AI ignores anyway to be left out. – AJMansfield – 2014-01-28T13:32:23.573

@AJMansfield, usually for reasons to do with stalemate, but the motivation is irrelevant. The question is whether a position is reachable by legal moves, not whether it is reachable by competent players. – Peter Taylor – 2014-01-28T14:05:14.233

Speaking of legal moves: Is it a legal move to "promote" a pawn into a pawn (i.e. not to promote it at all)? That is, does there exist a legal position where a white pawn is on row 8, or a black pawn is on row 0? (Of course no one in his right mind would do that, but it certainly affects the achievable compression). – celtschk – 2014-03-24T22:41:35.647

OK, I've now found out myself: It must be promoted to another piece. – celtschk – 2014-03-24T22:59:34.497

FYI, this is the http://wismuth.com/chess/statistics-positions.html definition of a "position"

– William Entriken – 2014-04-19T15:36:19.927

See also: minimum encodings of chess games: http://phor.net/apps/ign/

– William Entriken – 2014-04-19T15:36:50.483

@GeekWithALife Will you review answers here? – William Entriken – 2014-05-14T19:48:48.227

Answers

27

Min: 12 bits
Max:
Avg:

Had and thought last night that I could possibly make it even smaller.

x   Colour to play next (0 -> Black, 1-> White)
1   Only King left?

00000 Position of White King (0 -> A1 ... 63 -> H8)
00000 Position of Black King

01 00000 11111  WK:A1, BK:H2 (Black to play)
11 00000 11111  WK:A1, BK:H2 (White to play)

The result is an impressive size of 12 bits!

So what about K +1 other type of piece?

x
 0
   0
     000  +Pawn
     001  +Rook   
     010  +Knight
     011  +Bishop
     100  +Queen

There 2 possible arrangement of the sub tree.

   /\      /\
  +  K    K  +

Both result in the same bit sizes for all pieces. So it make no difference to which we use, I'll choose the first one.

x
 0
  0
   000
      1011001110000000000000000000000000000000000000000000000000000000000000
(+ 000) En-Passant (if >= 2 pawn & pawn in en-passant positions)
(+ 00 ) Castlings  (if >= 1 rook)
Min: 75 bit
Max: 109 bits

So on King +2 other piece types

x
 0
  1
   PRBNQ
   00011  +N +Q
   00101  +B +Q
   00110  +B +N
   01001  +R +Q
   01010  +R +N
   01100  +R +B
   10001  +P +Q
   10010  +P +N
   10100  +P +B
   11000  +P +R

There are 5 possible sub trees (I'll use 1 and 2 to indicate which of the pieces.)

   /\          /\       /\         /\          /\
  /  \        /  \     /  \       /  \        /  \
 K   /\      /\   2   /\   \     1   /\      /\   \
    1  2    K  1     K  2   1       K  2    1  2   K

So we'll require 3 bits to encode which sub tree to use.

x
 0
  1
   PRBNQ
         000  Sub Tree used

Min:= 11 = Header 
       6 = 2 * 3
       4 = 1 * 4
       4 = 1 * 4
      60 = 60    Empty
      --
      85 bits

Max:=  11 = Header
        4 =  2 * 4 Kings
       48 = 16 * 3 Pawns
       12 =  4 * 3 Rook
       42 = 42 * 1 Empty
        3 =  1 * 3 En-Passant
        2 =  1 * 2 Castlings
      ---
      122 bits

Still doing the analysis for more pieces

+3 Other

x
 0
  1
   PRBNQ
         0000  Sub Tree used (of 14 possible)

+4 Other

x
 0
  1
   PRBNQ
         000000  Sub Tree used (of 42 possible)

+5 Other

x
 0
  1
   PRBNQ
         0000000  Sub Tree used (of 132 possible)
 (+000)
 (+00)

Max: 208?


Is it possible to encode all these sub-trees into 9bits?

If we total up all of the possible sub-trees we get 392 possible sub-trees.

 1  0
 2  2
 3  5
 4  14
 5  42
 6  132
    ---
    392  <= 2^9

Using Freq ID

Since there 164603 unique piece frequencies.

Log2( 164603) = 17.3286110452
             ~ 18 bits

0
 0000 0000 0000 0000 00  Freq ID

(+000) (+00) Castling

Max:= 204 bits


rev 3

Min: 82 Max: 199 Avg: 160

Finally got around doing some analysis to find the maximum bit size. With optimal huffman encoding for each of the unique piece frequencies.

               0   Player
              00  Castling
               0  En-Passant Possible
            ?000  En-Passant column (include if En-Passant Possible = 1
  0000 0000 0000  Tree Encoding ID
[Board Encoding]  Between 66 .. 180 bits 

Note this is the worst possible size, which the En-Passant column bits if the number of pawns is greater than one. Irrespective of those pawns colours and position there is the potential for some boards to be 3 bits smaller.

Also there are only 144 different sizes(Worst case) for the size of the board.


75 - 216 bits (v2) v1 The minimum size is 98 bits (12.25 bytes), only the two kings on the board.

Maximum size is only 216 bits (27 bytes.) In the unlikly:-

  9 x Queens
  1 x King
  2 x Rooks
  2 x Knights
  2 x Bishops
on each side.

On average the size will be around 157 bits (19.625 bytes).

Pieces

To encode the board I'm using a binary tree encoding scheme. An empty square is the most frequent from any where between 32 and 62 appearances. Next are the pawns, then Rooks, Knights, Bishops and the least frequent are the Queen and King.

0 - left node
1 - righ node

     /\
    e  \    e:= Empty Square
      B/\W  B:= Black ; W:= White
      /  \
     /    \
    /      \
   /\      /\
  p  \    p  \  p:= Pawn
     /\      /\
    /  \    /  \
   /\  /\  /\  /\
  r  b n \ r b n \  r:= Rook; b:= Bishop; n:= Knight
         /\      /\ 
        q  k    q  k  q:= Queen ; k:= King

The starting board can be encode in just 166 bits (20.75 bytes)

  A     B     C      D      E     F     G     H
-----+-----+-----+------+------+-----+-----+------+
10100 10101 10110 101110 101111 10110 10101 10100 | 8 
  100   100   100    100    100   100   100   100 | 7
    0     0     0      0      0     0     0     0 | 6
    0     0     0      0      0     0     0     0 | 5
    0     0     0      0      0     0     0     0 | 4
    0     0     0      0      0     0     0     0 | 3
  110   110   110    110    110   110   110   110 | 2
11100 11101 11110 111110 111111 11110 11101 11100 | 1

To indicate whom's move it takes only single bit

0-> Black , 1-> White

Castling can be encoded in 4 bits.

 B  W
LR LR
00 00

So I've use 171 bits (21.375 bytes)

En-Passe can be encoded into just 16 bits (2 bytes)

So in total thats 187 bit (23.375 bytes).

Layout

  bits    Encodes
 0 -  15  En-Passe
16 -  19  Castling
      20  Move 
21 -  23  Unused
24 -> ..  Board

Not written any code yet.

Notice that 3 of the bits that are unused. So the max is 213bits.


Possible Improvements

1) Reduced the header block form 24 to 8 bits (with @Peter Taylor suggestion)

0 - 2 En-Passant
    3 Move
4 - 7 Castling
8 ... Board Pieces 

2) Variable length header

A small 4 bit fixed header

0 0 0 0
| | | |
| | | +-> En-Passant Block Present?
| | | 
| | +---> Pawns on board?
| |
| +-----> Castling still possible?
|                
+-------> Who's move? 0-Black 
                      1-White

Next block of additional bits (If castling is still possible)

00 00
|| ||
|| |+-> White Castle Right
|| +--> White Castle Left
||
|+----> Black Castle Right
+-----> Black Castle Left

Next block of additional bits (if pawns are present)

000--> En-Passant column Position

So now I have a variable length header 4 - 11 bits


3) Use a different encoding scheme depending on what pieces are left on the board.

By changing the tree encoding depending on what pieces are on the board and there frequency.

One possible encoding for an end game state (No Pawns)

        /\            
       e /\           
  Black /  \ White
       /    \
      /      \
     /        \       
    /\        /\
   /  \      /  \     
  /   /\    /   /\
 /\  / /\  /\  / /\   
r b n q k  r b n q k

Which is about ~4 bits per piece.

Which type of pieces are present on the board?

RBNQK Permutation
11111 (11111)

Permutation is Variable length 0-5 bits. If only one type of piece is left then don't include it.

Which permutation of those pieces to use for the tree? This is the factorial of the number of pieces in the above example it is 5 pieces so 120 possible permutations that can be encoded.

 #    !  bit 
 6  720  10  (If pawn included)
 5  120   6
 4   24   5
 3    6   3
 2    2   1  Don't include as of equal size.
 1    1   0  Don't include as its not needed.

Remember that there is addition bits for the empty squares and color.


Examples

Let's give an example of only QK left

RBNKQ
00011

  /\
 s  \
    /\
  B/  \W
  /\  /\
q  k q  k

101 100  0 x 60 110 111 ==> 60 + (2 x 6) = 60 + 12 = 72 bits for the board

0000 00011 Header ==> 9 bits

81 bits total


Let's give and example of only kings left

 RBNQK
 00001 

  /\
 s  k
   / \
  B   W

 10 0 0 0 0 0 0 0   K... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 0   .... ....
  0 0 0 0 0 0 0 11  .... ...k

Put all together

 header  4   0 0 0 0
 pieces  5   0 0 0 0 1
 perm    0   - - - - - -
  board 66   10 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 0
              0 0 0 0 0 0 0 11

So I calculate the smallest encoding for board at 75bits (9 bits 3 bit)

Still yet to calculate how this coding scheme affects the maximum size.


Improvement 4

Reduce the number of bits for castling to just 2 bits. Just castling for the player who's turn it is.

 0 Castling possible (from header block)
 LR 
 00

Thinking about it, it maybe better just to include the 2 bits inside the header block.

Adam Speight

Posted 2014-01-25T23:06:56.263

Reputation: 1 234

You don't need 16 bits for en passant. At most one pawn moved last turn, so four bits suffice (e.g. 1111 for "no en passant possible" or the column as a binary number otherwise). – Peter Taylor – 2014-01-26T15:24:41.777

Why do pawns get a better position in a tree? In the common case they're most common, but in extreme cases R/B/N can appear 10 times. – ugoren – 2014-01-26T21:16:50.963

@PeterTaylor, actually 3 bits are enough for en passant. If you count only columns with a black 5th rank pawn (assuming white move), 8 becomes invalid. – ugoren – 2014-01-26T21:26:59.813

@ugoren, I don't understand why. All 8 pawns are capable of being taken en passant. – Peter Taylor – 2014-01-26T22:25:56.777

@ugoren there are 16 pawns to begins with, by the time you've got 10 Rooks there be very few, and turn few peice in total. I think most plays there be more pawns than rooks. Until the endgame (no pawns at all). At which stage it may be beneficial to alter the coding scheme used, for example by list the order of pieces. Eg Empty, Queen, Rook, Knight Bishop, King. (A quick look at it is 5! possible variants or ~7 bits.) – Adam Speight – 2014-01-27T00:04:50.100

@ugoren, ok, I've got it. lg 5 bits needed for en passant. – Peter Taylor – 2014-01-27T00:40:32.910

@PeterTaylor En-Passant is only possible for one move per a pawn. If for example white pawn advances 2 squares, and black decides not to take it, they've lost that option (but could still take the pawn on the left if it do en-passant). So only 3 bits are needed. – Adam Speight – 2014-01-27T01:03:01.917

Can anyone work out the maximum bits for the new scheme? – Adam Speight – 2014-01-27T02:35:38.213

Adam, I know that: that's why I said "at most one pawn moved last turn". But the trivial representation of "Pawn X moved" also needs to encode "No pawn moved", for 9 possibilities, not 8. See my answer for how to reduce it to 5 possibilities. – Peter Taylor – 2014-01-27T07:59:37.653

1Note that your worst case is actually not possible. Promotion is impossible without captures (at least one capture per 2 promotions is needed). – ugoren – 2014-01-27T09:02:32.983

Correct: 6 possibilities. – Peter Taylor – 2014-01-27T09:19:43.400

@PeterTaylor En-Passant is taken care of in only 4bits in the improved header format. 1 bit to indicate if it possible. If it is an additional 3bit block is included. – Adam Speight – 2014-01-30T02:01:02.250

Hello @AdamSpeight I have also used a variable length Huffman encoding in my answer at https://codegolf.stackexchange.com/questions/19397/smallest-chess-board-compression/26225#26225 although please see the advanced worst case analysis that applies also to you

– William Entriken – 2014-04-23T17:20:07.407

@ugoren The game e4 f5 exf promotes three pieces with one capture. I have talked about this more at https://codegolf.stackexchange.com/questions/19397/smallest-chess-board-compression/26225#26225

– William Entriken – 2014-04-23T17:22:09.080

2Note to encode 64 positions (for white or black king) you need 6 bits (2**6 = 64). – lambruscoAcido – 2014-09-23T22:04:31.650

17

192 bits (worst case)

Here's a very simple storage scheme that should cope with arbitrary pawn promotions, and never requires more than 64 + 4 × 32 = 192 bits:

  • The first 64 bits store a bitboard that tells where the pieces are (but not what they are). That is, we store one bit for each square of the chessboard (starting at square a1, then b1, c1, etc. up to square h8) such that a vacant square is represented by 0 and an occupied square by 1.

  • Next, for each of the squares marked as occupied on the bitboard, we store a 4-bit nibble encoding the piece on that square. The first of the four bits encodes the color of the piece (0 = white, 1 = black), while the remaining three bits encode the type of the piece:

    +-----+-----+-----+-----+
    |Black|   Piece Type    |
    +-----+-----+-----+-----+
       4     3     2     1    Bits
    

    Piece Type

    0 = (normal) pawn
    1 = (normal) rook
    2 = knight
    3 = bishop
    4 = queen
    5 = king (of player to move next)
    6 = king (of the other player)

    Note the two encodings for the king, used to determine which player's turn it is to move. (In fact, since there are always two kings on the board, allowing for four combinations of the codes 5 and 6, we could rather easily encode a second bit of information here.)

    Black Type Description
    +----+----+--------------------------------+
    |  0 | 5  | White King; White to move next |
    +----+----+--------------------------------+
    |  0 | 6  | White King                     |
    +----+----+--------------------------------+
    |  1 | 5  | Black King; Black to move next |
    +----+----+--------------------------------+
    |  1 | 6  | Black King                     |
    +----+----+--------------------------------+
    

    To encode the extra information needed for the en passant and castling rules, we introduce one additional piece type, which denotes either a pawn or a rook depending on the row it appears on:

    7 (on rows 1 and 8) = a rook that has never moved, and whose king has also never moved, and which is therefore eligible for castling
    7 (on rows 4 and 5) = a pawn that has just advanced two squares, and may therefore be captured en passant

Putting it all together:

     Hex Description
    +---+---------------------------------------------+
    | 0 | White Pawn (normal)                         |
    | 1 | White Rook (has moved)                      |
    | 2 | White Knight                                |
    | 3 | White Bishop                                |
    | 4 | White Queen                                 |
    | 5 | White King; White to move next              |
    | 6 | White King                                  |
    | 7 | White Rook (pre castle) / Pawn (en Passant) |
    | 8 | Black Pawn (normal)                         |
    | 9 | Black Rook (has moved)                      |
    | A | Black Knight                                |
    | B | Black Bishop                                |
    | C | Black Queen                                 |
    | D | Black King; Black to move next              |
    | E | Black King                                  |
    | F | Black Rook (pre castle) / Pawn (en Passant) |
    +---+---------------------------------------------+

The total number of bits required to encode the state of the board is therefore 64 + 4 × # of pieces on board. Since there can never be more than 32 pieces on the board, the maximum length of this encoding is 192 bits.

For example, using the encoding described above, the initial state of the board would be encoded as (whitespace inserted for readability):

1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111
0111 0010 0011 0100 0101 0011 0010 0111 0000 0000 0000 0000 0000 0000 0000 0000
1000 1000 1000 1000 1000 1000 1000 1000 1111 1010 1011 1100 1110 1011 1010 1111

or, in hexadecimal:

FFFF 0000 0000 FFFF 7234 5327 0000 0000 8888 8888 FABC EBAF

Ilmari Karonen

Posted 2014-01-25T23:06:56.263

Reputation: 19 513

Okay -- this is mildly complicated. But hear me out. If you just take the 1 bit hit on a turn indicator, you could replace king with turn with a piece I call pawn1. Change pawn to pawn0. Now, whenever there is a (not en passant vulnerable) pawn -- you also gain one bit of information on the next piece (either a 0 for pawn0 or a 1 for pawn1). Because there are 16 pawns, you gain 16 bits less 1 for the turn indicator. Whenever there is a pawn that is vulnerable to en passant you have to add one bit back in after it. But as en passant must happen immediately, your minimum gains are 14 bits. – user5957401 – 2016-08-08T18:52:39.613

You could also do a similar thing with something like bishops. Suppose that you have a bishop not in its a 'special zone' (the 10 spots in its corners and the center row on its side) is marked as special. Because you know its location -- you know its a bishop. Now you have two bishops and could give each of them a 0 or 1 on the next piece. This gives up to another 4 bits -- but the worst case has bishops all over the special zones, and no gain. – user5957401 – 2016-08-08T18:55:28.550

2"pawn that has just advanced two squares" and "rook that has never moved" could share the same state slot since they are mutually exclusive depending on the row they are on. The extra free state could be used to encode "king of colour whose turn it is"; you should be able to get rid of the dangling bit that way. – FireFly – 2014-01-26T23:40:09.710

You can also arguably save a bit by only storing 63 bits for the bitboard, and inferring the last bit from the number of men encoded. (It's not clear to me whether this is cheating or not, because it requires external encoding of the length of the bit sequence). And if you ditch the states 6 and 7 for the men you need to encode up to 6^32, which takes 82.7 bits; rounding up to 83 it's saving 13 bits over using states 6 and 7, and you can encode that same information in only 8 bits. – Peter Taylor – 2014-01-26T23:46:47.157

Thanks, @FireFly! I've implemented your suggestion. (Peter Taylor's suggestion is even more efficient, of course, but I haven't used it so far because I kind of like the simple binary encoding of the current scheme. You could always submit it as a separate entry...) – Ilmari Karonen – 2014-01-27T00:33:48.207

14

160 bits worst case

After posting my previous answer 22 bytes, I began to wonder if we could get down to 21 bytes. However when I saw Peter Taylor's amazing 166 bytes I thought "Hang on, it looks like five 32-bit words could be possible!"

So after quite a lot of thought, I came up with this: 159.91936391 bytes (a pretty tight fit!) This level of compression would need a fairly complicated program but I have thought about how to make it run in reasonable time.

This is going to be a long post, so please bear with me, I will post what I can today and add a few bits of code soon.

So, here's how to do it:

En Passant and castling encoded by illegal positions (0 bits)

En Passant

As mentioned in other answers, there are a maximum of 5 possible squares on which a pawn vulnerable to en passant can stand. These are the squares next to the pawns of the player whose turn it is.

To encode this, the pawn vulnerable to en passant is exchanged with whatever is on one of the squares in the first or last row. This may be either a man or an empty square. This produces an illegal position, as pawns cannot be on these rows. The decoder must return the pawn to its correct position, extracting the en passant information.

In order for this to not interfere with the castling encoding, it is important that the squares on which the kings stand at the start of the game are not disturbed, and that the en passant encoding procedure does not place the kings next to each other, which would be an illegal king position. To satisfy the second of these points, the encoder has two choices as to which square it exchanges the en passant pawn. First choice square for each of the up to 5 pawns are A8,B8,C8,G8,H8. Second choice: A1,B1,C1,G1,H1.

Castling

A king who is allowed to castle is by definition, still on his initial square. With the white king on his initial square, there are a total of 63 squares where the black king can stand, 58 of which are legal (he is not allowed to move right next to the white king because he would put himself in check.) If the white king is allowed to castle, he is either allowed to castle with his left rook, his right rook, or both. There are thus 3x58=174 possibilities where the white king can castle, another 174 where the black king can castle, and a further 3x3=9 where both can castle, a total of 357.

There are 420 illegal arrangements of the two kings where they are on adjacent squares: 3x4=12 when the white king is in the corner, 5x24=120 when he is on the edge, and 8x36=288 when he is on another square. Therefore there are easily enough illegal positions to encode all possible castling possibilities.

If at least one king is allowed to castle, encoder would look up the castling data and the positional data of those kings not allowed to castle in a table (or alternatively, use an algorithm which I won't specify here) and produce an illegal position of the two kings. It would then exchange the kings with whatever happened to be on these squares.

It is important that this is encoded after and decoded before the en passant, otherwise there are some potential interferences.

Comparison

So, so far I have used no bits! Looking at Peter's answer, I still have the following to encode:

Whose turn is it?                                   1.000 bits
Which squares are occupied by men of which colour? 91.552 bits 
Subtotal                                          *92.552 bits* 
For the two colours, which men and which order?   *68.613 bits* 
GRAND TOTAL                                       161.165 bits

This is for the worst case of 29 men (see Peter's answer.) Below I will show how I make a marginal improvement (at least for the case of 29 men) on both of the points marked in **.

Which squares are occupied / whose turn is it?

The easy way to encode which squares are occupied is with a 64 bit grid. This also tells us how many squares are occupied. However it is somewhat wasteful because it is not possible for there to be more than 32 squares occupied. My solution is to use 1's to encode for the occupied squares when it is White's turn and 0's to encode for occupied squares when it is Black's turn. Now all combinations are used and there is no waste.

Thus we save on a bit for storing the turn: less than 32 1's, it is white's turn, more than 32 1's, it is black's turn. The only ambiguous case is when all the men are on the board and there are 32 1's and 32 0's. Therefore an extra bit is needed for this case only. As there can be no promotions until a capture has occurred, this extra bit does not affect the worst case overall (which occurs with 3 men captured and 29 remaining.)

Colour of the men occupying the squares

We know from the above how many men there are. The following extract of Pascal's triangle tells how many possibilities there are for the different distributions of black and white. For example, for 3 men, the possibilities are: 3 black men (1 permutation) 2 black, 1 white, (3 permutations), 1 black, 2 white (3 permutations), 3 white (1 permutation.) The total is 23=8. In general, for the lower numbers of men there are 2n possibilities. However the all black and all white possibilities are illegal (at least the king of each side must be on the board) so the actual number of legal permutations is 2n-2 (ignore the 1's on the Pascals triangle.)

For more than 16 total men, there is an additional constraint in that there can be no more than 16 men of each colour on the board. Therefore when all 32 men are on the board there must be 16 of each and the total number possibilities is 601080390 which is quite a bit less than 232.

1   1    1    1      1     1      1       1       1        1        1         1         1         1          1          1          1 
1   2    3    4     5      6      7       8       9       10       11        12        13        14         15         16         17
1   3    6   10    15     21     28      36      45       55       66        78        91       105        120        136        153
1   4   10   20    35     56     84     120     165      220      286       364       455       560        680        816        969
1   5   15   35    70    126    210     330     495      715     1001      1365      1820      2380       3060       3876       4845
1   6   21   56   126    252    462     792    1287     2002     3003      4368      6188      8568      11628      15504      20349
1   7   28   84   210    462    924    1716    3003     5005     8008     12376     18564     27132      38760      54264      74613
1   8   36  120   330    792   1716    3432    6435    11440    19448     31824     50388     77520     116280     170544     245157
1   9   45  165   495   1287   3003    6435   12870    24310    43758     75582    125970    203490     319770     490314     735471
1  10   55  220   715   2002   5005   11440   24310    48620    92378    167960    293930    497420     817190    1307504    2042975
1  11   66  286  1001   3003   8008   19448   43758    92378   184756    352716    646646   1144066    1961256    3268760    5311735
1  12   78  364  1365   4368  12376   31824   75582   167960   352716    705432   1352078   2496144    4457400    7726160   13037895
1  13   91  455  1820   6188  18564   50388  125970   293930   646646   1352078   2704156   5200300    9657700   17383860   30421755
1  14  105  560  2380   8568  27132   77520  203490   497420  1144066   2496144   5200300  10400600   20058300   37442160   67863915
1  15  120  680  3060  11628  38760  116280  319770   817190  1961256   4457400   9657700  20058300   40116600   77558760  145422675
1  16  136  816  3876  15504  54264  170544  490314  1307504  3268760   7726160  17383860  37442160   77558760  155117520  300540195
1  17  153  969  4845  20349  74613  245157  735471  2042975  5311735  13037895  30421755  67863915  145422675  300540195  601080390

The number of possibilities can be found by summing the "rows" of this extract of pascals's triangle (by which I mean the NE-SW diagonals of the table , because I rotated the triangle 45 degrees anticlockwise for convenient presentation. The number of bits needed to encode the turn, occupied squares and colour of the men is therefore as follows:

Up to 25 men: slightly less than 64+(number of men)
For more than 25 men:

men permutations  bits required  occupied sq+turn   
    of colours                   (bits required)  total bits
26   55791790     25.7335495      64              89.7335495
27  100960110     26.58921015     64              90.58921015
28  175844430     27.3897244      64              91.3897244
29  290845350     28.115677       64              92.115677   
30  445962870     28.73234836     64              92.73234836
31  601080390     29.16298271     64              93.16298271
32  601080390     29.16298271     65              94.16298271

For the two colours, which men and in which order?

As per previous answers, no pawns can be promoted until a capture has occurred, because each pawn is blocked by a pawn of the opposite colour on the same column. Peter's answer considered (as an upper bound) that every capture could lead to one promotion for the side being captured and two for the side capturing. However we can split this into several cases:

  1. Black pawn captures white pawn: Now the capturing pawn is free to promote as he is now on a different column. His colleague on the same column can also promote. The black pawn on the white pawn's original column can also promote. This is the only case that permits 3 promotions.

  2. Black pawn goes past white pawn on adjacent column and then captures white piece (other than pawn) behind. This enables the capturing pawn and the white pawn who was on the original column to promote. One promotion for each side.

  3. White pawn is captured by piece (other than pawn.) This would normally allow one promotion for Black only. The only exception is when it liberates a blocked up pawn formation which was already caused by several captures moving pawns onto the same column.

So, as a base case, we can consider that each capture permits one promotion each for both sides. In the case that the man captured is a pawn, there may be an additional promotion for the capturing side.

I have written a program similar to Peter's. It is somewhat cruder, but it has an important addition: it can calculate the number of permutations possible when a player starts with less than the normal 8 pawns. Here is some data produced by the program:

Max promotions   0            1            2             3             4              5 
8 PAWNS 
13 men    18725850    146911050    567991710    1373480394    2297173164     2902775304
14 men    36756720    339459120   1555313760    4501448952    9021804792    13325103792
15 men    60810750    660810150   3555401850   12144582450   28834205400    50030580600
16 men    64864800    843242400   5383778400   21810428640   61514893440    1.26476E+11
7 PAWNS                         
13 men    17760600    141003720    546949260    1321302840    2200401060     2761730400
14 men    30270240    287567280   1331890560    3852728880    7641553920    11068817760
15 men    32432400    372972600   2075673600    7209001800   17135118000    29315286000
6PAWNS                          
13 men    14054040    114594480    447026580    1069488420    1739577840     2113185360
14 men    15135120    151351200    718918200    2087805720    4073028960     5697051360                         
5 PAWNS                         
13 men     6486480     55135080    217297080     510630120     794233440      910235040

We can see that for a case like 8 pawns, 15 men, 0 promotions, the number of permutations is only slightly less than for 8 pawns 16 men, 0 promotions. However if we consider a case like 7 pawns, 15 men, 0 promotions (which is the same as considering that the man captured was definitely a pawn) we get about half the number of permutations.

So, For the case when Black has 16 men and white has 15 men, we can consider an upper bound estimate of 2 promotions for Black and one promotion for White:

5383778400 x 660810150 = 3.55766E+18 possibilities

However we can do better if we proceed as follows.

A. Consider one promotion each for Black and White assuming that the man White has lost could be of any type:

843242400 x 660810150 = 5.57223E+17 possibilities

B. Consider the additional possibilities for Black if he has two promotions, multiplied by only those possibilities for White in which he has lost a pawn.

(5383778400-843242400) x 372972600 = 1.6935 E+18 possibilities.

Adding these two together we get 2.25072E+18, which is a smaller number than 3.55766E+18. All the possibilities for up to 3 captures (29 men remaining) are listed below.

(Promotions, Pawns lost) possibilities

BLACK 16 MEN, WHITE 15 MEN. ESTIMATE   3.55766E+18 = 2^61.62563249
(1,0)   843242400 x (1,0)  660810150 = 5.57223E+17
(2,0)  4540536000 x (1,1)  372972600 = 1.6935 E+18
                               TOTAL   2.25072E+18 = 2^60.96509144


BLACK 16 MEN, WHITE 14 MEN. ESTIMATE   9.5675 E+19 = 2^66.3747752
(2,0)  5383778400 x (2,0) 1555313760 = 8.37346E+18
(3,0) 16426650240 x (2,1) 1331890560 = 2.18785E+19
(4,0) 39704464800 x (2,2)  718918200 = 2.85443E+19
                               TOTAL   5.87962E+19 = 2^65.67235739


BLACK 16 MEN, WHITE 13 MEN. ESTIMATE   2.69447E+20 = 2^67.86856193
(3,0) 21810428640 x (3,0) 1373480394 = 2.99562E+19
(4,0) 39704464800 x (3,1) 1321302840 = 5.24616E+19
(5,0) 64960896000 x (3,2) 1069488420 = 6.94749E+19
(6,0) 69702272640 x (3,3)  510630120 = 3.55921E+19
                               TOTAL   1.87485E+20 = 2^67.34533572


BLACK 15 MEN, WHITE 15 MEN. ESTIMATE   1.47491E+20 = 2^66.99918768
(2,0)  3555401850 x (2,0) 3555401850 = 1.26409E+19
(2,1)  2075673600 x (3,0) 8589180600 = 1.78283E+19
(3,0)  8589180600 x (2,1) 2075673600 = 1.78283E+19
(3,1)  5133328200 x (3,1) 5133328200 = 2.63511E+19
                  TOTAL BOTH COLUMNS   7.46486E+19 = 2^66.01674923


BLACK 15 MEN, WHITE 14 MEN. ESTIMATE   4.51366E+20 = 2^68.61286007      
(3,0) 12144582450 x (3,0) 4501448952 = 5.46682E+19
(3,1)  7209001800 x (4,0) 4520355840 = 3.25873E+19
(4,0) 16689622950 x (3,1) 3852728880 = 6.43006E+19
(4,1)  9926116200 x (4,1) 3788825040 = 3.76083E+19
(5,0) 21196375200 x (3,2) 2087805720 = 4.42539E+19
(5,1) 12180168000 x (4,2) 1985223240 = 2.41804E+19
                  TOTAL BOTH COLUMNS   2.57599E+20 = 2^67.80368692

So for the worst case of one side with 15 men and the other with 14 men, we need 67.804 bits.

Adding this to the 92.116 bits required to specify which squares and which colour, we get a total of 67.804+92.116=159.92 bits.

Level River St

Posted 2014-01-25T23:06:56.263

Reputation: 22 049

1Many thanks to @Einacio for changing my decimal commas to decimal points. I did a lot of my tables on Excel on a Spanish computer, and posting this was a big job, so fixing this was something I left for later. As I say, I haven't finished this post yet, I will add my permutation counting program and some fragments of code about encoding / decoding when I have time. PS. I had no idea so many people were reading this :-) – Level River St – 2014-02-12T20:54:07.250

at the end you managed to tak of bytes instead of bits which is the thing you meant, that might cause some cinfusion to the readers – masterX244 – 2014-09-27T13:42:30.147

13

177 bits worst case

This algoritm, while hardly simple, gives a 177 bits worst case (184b=23B in practice), 13b (16b=2B) best case scenario when there's just 2 kings left.

Bit     Description
  1     Turn (0=white 1=black)
  2-  7 White king position (2-4=letter, 5-7=number)
  8- 13 Black king position (8-10=letter, 11-13=number)
 14- 75 Which squares contain pieces (skipping the 2 king squares, so only 62)
        Ordered a1-h1,a2-h2,(...)
 76-105 Which color owns the square with their piece (0=white, 1=black)
        If there's LESS than 30 pieces (apart from kings), this area is
        smaller
106-end Square data

Square data has the following system:
Every square gets assigned a number which determines piece. Number is:
0 Queen
1 Rook
2 Bishop
3 Knight
4 Pawn OR allowed-castle rook depending on square
5 Pawn subject to potential enpassant

The first bits (max 13) is the potential enpassant slots from A-H, determined
from data of 1 + 14-105 for which of the squares has a piece, and which color
owns the piece and whose turn it is. For example, if turn is White (bit 1 is
0), all pieces on row 5 which is Black owned (determined from 14-105 metadata)
and has at least 1 adjacant (on the same row) square owned by White, is
explained in A-H order. A base 6 number is used which is converted to binary
for the storage. On reading, it's converted and read A-H according to the
numbers above (4 is obviously pawn in this case).
The second amount of bits takes care of the 1st and 8th row (not corners!)
in b1-g1,b8-g8. These only take up 2 bits since 4 or 5 is never needed
(pawn on 1st or 8th is invalid).
The third amount of bits takes care of the rest of the board, in the following
order: a1,h1,a2-h2,a3-h3,a4-h4,a5-h5,a6-h6,a7-h7,a8,h8 (skipping the
"enpassant" slots), in base 5 (since piece ID 0-4 are the only used) converted
to binary.

Best case: 13 bits (bit 1 for turn, bit 2-12 for kings)
Worst case: 177 bits
* 32 pieces with kings
* 5 viable enpassant pawns
* No pieces at 1st or 8th row (except if kings+rooks are at initial posions
whether or not they can castle)
In this case, the space as following:
  1   bit   turn
+ 12  bits  king positions
+ 62  bits  which squares have pieces
+ 30  bits  color of pieces
+ 13  bits  enpassant area
+ 0   bits  initial rows area
+ 59  bits  the rest of the area
= 177 bits  total

Potential optimizations but not really worth it IMO:
* Decrease average by make corners 2 bits as well if kings aren't at e1/e8
* Alter reading order to read b1-g1,b8-g8 last - decreases worst case to
  176 bits if the "which squares have pieces" area is cut off if 30 existing
  pieces has been defined already. Would actually save 8 bits on file but meh

FIQ

Posted 2014-01-25T23:06:56.263

Reputation: 519

Very nice. You can make this even more efficient by replacing bits 14-105 (92 bits) with an encoding based on multinomial coefficients. sum_{i=0}^{15} sum_{j=0}^{15} 62! / (i! j! (62-i-j)!) < 2^87.45. – Peter Taylor – 2014-01-27T09:32:33.797

The only thing I would change is to create a rather more simplified version for the enpassant area. For example: if you encode the 30 pieces in base 5, and there are a maximum of 5 enpassant positions, then you can have 5^31 < 2^72. The same as if you split them in enpassant (13) and non-enpassant(59), but without the extra complexity. – Alin Stoian – 2014-01-28T08:46:54.363

Doing that would actually use 1 extra bit. The reason is that there can (worst case) be 5 enpassant possibilities, but I still need to declare the possibility for "no enpassant", i.e. a 6th state. The 1 extra bit would in this case go to declaring enpassant possible or not (and with this approach I could as well use an even simpler approach with encoding the 30 pieces skipping the enpassant block, and use 3 bits seperately for enpassant check, which would also lead to +1 bit use). The following 5th row would enable 5 potential enpassants (White's turn): BWBBWBBW – FIQ – 2014-01-28T09:29:36.663

yes, you're right. – Alin Stoian – 2014-01-28T10:42:19.440

7

166 bits

  • 1 bit: whose turn is it?
  • 2 bits: which castling options are open? (NB on close reading of the question, it's only necessary to record castling options for the player whose turn it is).
  • lg 6 ~= 2.585 bits: which en passant options are open? (See my other answer)
  • lg sum_{i=1}^{16} sum_{j=1}^{16} 64! / (i! j! (64-i-j)! = lg 3629590441720924477681996172 ~= 91.552 bits: which squares are occupied by men of which colour?
  • At worst lg 451366131803622235200 ~= 68.613 bits to indicate which men and in which order (see below)

Using arithmetic encoding (since at each step we're applying a uniform distribution) we can achieve ceil(3 + 2.585 + 91.552 + 68.613) = 166 bits.

The encoding for the men: given that we know how many men of a given colour there are, we can easily enumerate all possible distributions/multisets of men (e.g. with 5 men we might have one King, one Queen, two Rooks, and a Pawn) and then we can consider all possible permutations of each distribution.

However, we can do even better by taking into account interdependencies. I'm only doing this on a very basic level: how many possible promotions? A pawn can only become "passed" and able to promote in three ways: it captures and so moves into a different column; or its opposing pawn captures and so moves into a different column; or its opposing pawn is captured. Thus a capture for white potentially creates two passed pawns for white and one for black.

We can build up a partial table of the upper bounds on promotions:

(Max white promos, max black promos):

           White men
           16      15      14      13
Black men
       16  (0, 0)  (1, 2)  (2, 4)  (3, 6)
       15  (2, 1)  (3, 3)  (4, 5)  (5, 7)
       14  (4, 2)  (5, 4)  (6, 6)  (7, 8)
       13  (6, 3)  (7, 5)  (8, 7)  (8, 8)

We can also calculate the number of permutations given that a player has N men and no more than P promoted pawns:

Num of permutations (cumulative):
    max promotions: 0              1              2              3              4              5              6              7              8
 1 men              1              1              1              1              1              1              1              1              1
 2 men             10             10             10             10             10             10             10             10             10
 3 men             72             75             75             75             75             75             75             75             75
 4 men            436            496            500            500            500            500            500            500            500
 5 men           2305           3025           3120           3125           3125           3125           3125           3125           3125
 6 men          10746          17106          18606          18744          18750          18750          18750          18750          18750
 7 men          44170          88795         106260         109179         109368         109375         109375         109375         109375
 8 men         159832         415360         575240         619200         624744         624992         625000         625000         625000
 9 men         509841        1721961        2884815        3398769        3504735        3515301        3515616        3515625        3515625
10 men        1447200        6258240       13063080       17697780       19260180       19510320       19530840       19531230       19531240
11 men        3706065       20021265       52183395       85007571      102173181      106786581      107369592      107409918      107410281
12 men        8678340       57101220      183088620      364510476      509818716      570620556      584017632      585352152      585430164
13 men       18725850      146911050      567991710     1373480394     2297173164     2902775304     3107861328     3143928216     3146014014
14 men       36756720      339459120     1555313760     4501448952     9021804792    13325103792    15664512864    16283899632    16360920576
15 men       60810750      660810150     3555401850    12144582450    28834205400    50030580600    66655789200    73588394880    74576231730
16 men       64864800      843242400     5383778400    21810428640    61514893440   126475789440   196178062080   240747386880   253686232800

Combining the two, we can get the number of bits of required to specify both permutations given the numbers of men on both sides:

           White men
           16      15      14      13      <13
Black men
       16  51.902  61.626  66.375  67.868  <=67.009
       15  --      67.000  68.613  67.534  <=65.243
       14  --      --      67.734  65.480  <=63.055
       13  --      --      --      63.102  <=60.676

If it's not in this section of the table, we can just assume that both sides have up to 8 promotions and we're still doing better than the worst case, which is 68.613 bits when one has 14 men and the other has 15 men.

Note that this is still a long way from being a perfect representation, because it allows many many illegal positions.

Code for calculating the permutation table:

import java.util.*;

public class ChessCombinatorics {
    public static void main(String[] args) {
        long[] f = new long[17];
        f[0] = 1;
        for (int i = 1; i < 17; i++) f[i] = i * f[i-1];

        // Indexed by num promotions, then total num men.
        long[][] distribs = new long[9][17];
        long[][] perms = new long[9][17];

        for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
            Map<Integer, Map<String, Long>> numCases = new HashMap<Integer, Map<String, Long>>();
            for (int i = 1; i < 17; i++) numCases.put(i, new HashMap<String, Long>());

            for (int extraQ = 0; extraQ <= promotedPawns; extraQ++) {
                for (int extraR = 0; extraR + extraQ <= promotedPawns; extraR++) {
                    for (int extraN = 0; extraN + extraR + extraQ <= promotedPawns; extraN++) {
                        int extraB = promotedPawns - extraN - extraR - extraQ;
                        int unpromotedPawns = 8 - promotedPawns;

                        // Promoted pawns should only count towards their new type if the existing ones are alive.
                        // Otherwise we double-count some cases.
                        int minQ, maxQ, minR, maxR, minN, maxN, minB, maxB;
                        if (extraQ == 0) {minQ = 0; maxQ = 1;} else {minQ = maxQ = 1 + extraQ;}
                        if (extraR == 0) {minR = 0; maxR = 2;} else {minR = maxR = 2 + extraR;}
                        if (extraN == 0) {minN = 0; maxN = 2;} else {minN = maxN = 2 + extraN;}
                        if (extraB == 0) {minB = 0; maxB = 2;} else {minB = maxB = 2 + extraB;}

                        for (int numQ = minQ; numQ <= maxQ; numQ++) {
                            for (int numR = minR; numR <= maxR; numR++) {
                                for (int numN = minN; numN <= maxN; numN++) {
                                    for (int numB = minB; numB <= maxB; numB++) {
                                        for (int numP = 0; numP <= unpromotedPawns; numP++) {
                                            // The number of possibilities at these values is (numK + numQ + numR + numN + numB + numP)! / (numK! numQ! numR! numN! numB! numP!)
                                            numCases.get(1+numQ+numR+numN+numB+numP).put(numQ+","+numR+","+numN+","+numB+","+numP, f[1 + numQ + numR + numN + numB + numP] / f[numQ] / f[numR] / f[numN] / f[numB] / f[numP]);
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }

            for (int numMen = 1; numMen < 17; numMen++) {
                distribs[promotedPawns][numMen] = numCases.get(numMen).size();
                if (distribs[promotedPawns][numMen] > 0) {
                    for (Long l : numCases.get(numMen).values()) perms[promotedPawns][numMen] += l;
                }
            }
        }

        System.out.println("Num of permutations (cumulative):");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("%15d", cumul));
            }
            System.out.println();
        }

        System.out.println("Entropy of permutations:");
        System.out.println("    max promotions: 0              1              2              3              4              5              6              7              8");
        for (int numMen = 1; numMen < 17; numMen++) {
            System.out.print(String.format("%2d men", numMen));
            long cumul = 0;
            for (int promotedPawns = 0; promotedPawns < 9; promotedPawns++) {
                cumul += perms[promotedPawns][numMen];
                System.out.print(String.format("  %6.3f", Math.log(cumul) / Math.log(2)));
            }
            System.out.println();
        }

    }
}

Peter Taylor

Posted 2014-01-25T23:06:56.263

Reputation: 41 901

How do you deduce the positions of the kings? You use 15 men in your computation and no special bits for king positions. – Alin Stoian – 2014-01-28T21:43:11.273

@AlinStoian, oops. I had a < rather than <= in the output loop of my program. Thanks for pointing it out. I could still recover the previous score by special-casing all 32 men being on the board, but I won't do that right now. – Peter Taylor – 2014-01-28T22:42:45.833

Interesting data! Theoretical worst case with 3 men captured – Level River St – 2014-01-31T15:40:47.873

@steveverrill, what I would really like to do is encode the pawn positions and number of promotions in one "block" and then the piece positions and values. However, there are at least 2^38 pawn positions without taking into account promotions, and enumerating them efficiently has so far escaped me. – Peter Taylor – 2014-01-31T15:45:21.100

@petertaylor If you only have 16 pawns on the board, restricted to 48 squares, you have 48!/32!/8!/8!= 29019905518636890 possibilities. Slightly more than 2^54! Of these, some are illegal, you can't have all the pawns of one colour on one side of the board. – Level River St – 2014-01-31T16:11:20.820

@petertaylor Do you have a way of encoding / decoding in mind? I have some ideas for encoding/decoding permutations but they're complicated. – Level River St – 2014-01-31T16:15:00.967

@steveverrill, to encode a value from n! / (k_1! ... k_j! (n-k_1-...-k_j)!) you rewrite as a product of binomial coefficients and then treat it as a variable-radix number. Binomial coefficients are easily encoded in lexicographic order. – Peter Taylor – 2014-01-31T16:27:15.387

@petertaylor based on your answer I just posted another one for 160 bits (which when I saw your answer seemed a good goal to aim for.) It's a long one (not as long as Adam Speight's) and I haven't included the bits of code that I have yet. As I see you're on here a lot, I was just wondering what the polite way of referring to someone else's post is. I mention your name about 4 times in the text I have written, which is slightly embarrasing to me, but if I don´t refer to your answer mine will be even longer. – Level River St – 2014-02-12T17:55:56.983

Also, binomial coefficients and lexicographic order I get, but I'm not sure about variable radix. If you have 5 different objects you know you have 5x4x3x2x1 possiblilities, and once you now what the 1st one is you know there are 1x2x3x4 possibilities left. However if you have 5 squares and you know that 2 have a man on them, after you have inspected the first one you know that either a) it has a man on it and there are 4 possibilities for the other squares or b) it doesn't have a man on it and there are 6 possibilities for the other squares. The variation complicates, but it's still doable. – Level River St – 2014-02-12T18:08:45.953

@steveverrill, when I refer to a user I generally link their name to their user page (e.g. [Peter Taylor](http://codegolf.stackexchange.com/users/194). To refer to a specific answer, you can get a URL from the "share" link at the bottom-left of the answer. I'll read your answer in detail later. – Peter Taylor – 2014-02-12T18:15:17.640

5

178 bits (174 at a pinch!) worst case

Hi, just getting back into coding, which I haven't really done since college. I saw this site and thought this looked interesting. I did a little theroretical check and it appears at least 146 bits are needed for a perfect algorithm, probably quite a few more (I will explain in the comments when I have a moment.)

Anyway, this is the way I structure the data. The basic concept comes in at 178 bits, but with some jiggery pokery it can brought down to 174 (that's 21 3/4 bytes). 175 is slight easier to program, is more human readable, and still within 22 bytes.

A) Position of both kings: 6 bits each for white and black 12 BITS

B) Of the remaining 62 squares, which are occupied? A matrix of 62 BITS

C) Whose turn is it? 1 BIT

TOTAL SO FAR: 75 BITS

D) En Passant. If it is white’s turn to move, up to 5 black pawns may look like they could be captured En Passant. The black pawn has to be on row 5 (from bottom to top starting at zero), and have a white pawn next to it. One situation with maximum number of possible captures looks like this:

B W B B W B B W

If there were 6 black pawns on row 5, white would only have 2 squares to stand on and could only threaten 4 black pawns, so it is not possible to have more than 5 black pawns apparently under threat from En passant at the same time. So we need a number 1 to 5 indicating which of the (up to 5) pawns on row 5 that has a hostile (in this case white) pawn next to it was advanced 2 squares on the last turn (or, zero if no pawn in this situation was moved in this way on the last turn.)

E) Of the up to 30 occupied squares (not including kings) what do they contain?

There are 10 possibilities, each represented by a decimal number.

The least significant bit represents the colour.

Hence even numbers are white, odd numbers are black.

White/Black

Pawn 0/1 (or Rook that is allowed to castle*)

Knight 2/3

Bishop 4/5

Rook 6/7

Queen 8/9

*A rook that is allowed to castle (and therefore has never been moved from the first or last row) is represented by 0 or 1 instead of 6 or 7. It cannot be confused with a pawn, because pawns cannot be found on the first or last row.

This gives a decimal number of up to 30 digits, which we can multiply by 6 and then add the code for En passant. The resulting number will fit into 103 bits, which when added to the 75 mentioned above is 103+75=178 bits. Actually, if we simply multiply by 10 instead of 6, it makes no difference to the number of bits used, and decoding is easier.

This is just 2 bits more than 22 bytes. However we can push it down to 174 bits, as explained below.

If no piece has been captured, then it is impossible for a pawn to be promoted.

The proof is as follows. Imagine white is obsessed with promoting his pawn on (for example) column E, from the very start of the game. There is a black pawn opposite this pawn that is in the way. Therefore to promote this pawn, one of the following must happen:

1) The black pawn is captured.

2) The black pawn captures another piece and therefore moves out of the way.

3) the white pawn captures a pawn on an adjacent column such as column D.

4) the white pawn passes (or is passed by) a black pawn on an adjacent column and then captures a piece on that same adjacent column, causing the white pawn to change column.

Case 4 is the most interesting, because it is not just the white pawn that started on column E that now has a clear path to promotion. The black pawn on column that remains on column E can also promote. Therefore a single capture can clear the way for one pawn of each colour to promote.

Anyway, the fact that no pawn can promote until a piece is captured means that we do not have to store the 30th piece. We can work it out by elimination (or by subtraction, because the complete set of piece codes at the start of the game always adds up to the same amount =80.) One minor point is that we must ensure that the squares where the rooks stand at the start of the game are amongst the first scanned (because if they were last, we would not know if the rook could castle or not.) This is easily done by scanning row 0 and then rows 7 to 1: For r= 8 to 1 scan row [r mod 8].

So, the matrix of bits in (B) will tell us how many pieces there are (excluding kings.) If there are a full 30, ignore the last piece when encoding, the decoder will work out what it was. We now have an up to 29 digit decimal number, which we multiply by 6 and add to the En Passant code. The resulting number will just squeeze into 99 bits, giving a total of 99+75=174 bits.

As an example Here’s an actual position. White has just made his first move (advanced king’s pawn) and it is Black`s turn.

rnbqkbnr
pppppppp


    P

PPPP PPP
RNBQKBNR

A) Position of kings (White / Black in octal, 12 bits): 03 73 = 000011 111011

B) Which squares are occupied? Start with row zero (bottom row) then all other rows from top to bottom, skipping the kings:

1111 111

1111 111
11111111
00000000
00000000
00001000
00000000
11110111 

C) Black’s turn: Turn Bit = 1

D) En Passant. There is no white pawn next to next to a black pawn, therefore there is no pawn that can be taken en passant (even though this pawn did advance last move) so D=0. If, instead of considering only pawns which have a hostile pawn beside them, we consider all pawns that do not have friendly pieces beside them on both sides, then D would be 1, as there is one such pawn in this situation, and this particular pawn was indeed moved in the last turn.

E) Again, bottom row first, then all other rows from top to bottom, skipping the kings, and with the four uncastled rooks referred to as 0 or 1 (numbers normally reserved for pawns.)

RNBQ BNR =   0248 420
rnbq bnr =   1359 531
pppppppp =   11111111
PPPPPPPP = (0)0000000

The 30th digit (in brackets) can be discarded.

Though it is not very apparent here, the pawn that White has advanced is actually at one end of the list of pawns, because we scan row by row.

Our data now looks like this, with 29 codes for the content of squares, plus the En Passant code:

 (0 discarded) 0000000 11111111 1359531 0248420 (0 en passant)

It is best to scan from right to left when decoding and from left to right (reverse order) when encoding. This means that when there are fewer pieces we will have a smaller number, while retaining maximum consistency (i.e we want the blank space / zeroes to be leading, not trailing, to enable compression of sparsely occupied boards.) When we have only 2 kings on the board, we will have the 75 bits mentioned above, plus 3 bits to store en passant data = 78 bits in the best case. Each additional piece comes in slightly under 3.5 bits (2 pieces can be stored in 7 bits, because 100<128.)

There is a practical problem in that a 99 bit integer is too big to fit in a 64 bit integer variable, which means many programming languages do not provide support for it (you can't simply convert a string representation of a 29-30 digit number into an integer.) As an easy way of encoding for 22 bytes, we can break a 30 digit number (29 piece codes + en passant code) into two 15 digit numbers each of which will fit in 50 bits each (total 100 bits plus the 75 mentioned above makes 175 bits worst case.)

For maximum compression, as stated above, 29 decimal digits plus the En Passant code (6 possible values), will just about fit into 99 bits (for a total of 174 bits) but without support from the language for integers of this size it is complicated to program. It may be easier to separate out the 29 colour bits and work with the piece-type codes (5 possibilities) and En passant code (6 possibilities) separately from the colours (70 bits, almost fits into a 64 bit variable.)

Level River St

Posted 2014-01-25T23:06:56.263

Reputation: 22 049

Nice trick with the last man. – Peter Taylor – 2014-01-28T08:37:39.113

5

Here is a full solution, actual worst case 181 bits

The focus here is a simple program you can easily understand

Input is FEN, here is opening position, it has six fields (5 & 6 ignored):

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

First field (piece placement) is parsed

perl -pe 's/\d/"_"x$&/ge;s/\s.*//;s|/||g'

To produce:

rnbqkbnrpppppppp________________________________PPPPPPPPRNBQKBNR

Field one: encode the location of kings (12 bits):

printf("%b",index('k',$_))
printf("%b",index('K',$_))

Field two: encode pieces (up to 5 bits per piece):

s/_/0/g     Blank
s/P/100/g   From here, as normal chess meaning
s/p/101/g
s/Q/11000/g
s/q/11001/g
s/R/11010/g
s/r/11011/g
s/B/11100/g
s/b/11101/g
s/N/11110/g
s/n/11111/g
s/K//
s/k//

Field three: active color (1 bit)

s/w/0/
s/b/1/

Field four: castling availability (4 bits)

m/K/?1:0
m/k/?1:0
m/Q/?1:0
m/q/?1:0
  • FIQ shows this field could be removed altogether.

Field five: en passant (zero or 3 bits)

printf("%b",ord($1)-ord("a")) unless m/-/
// The EP's rank is 3 or 6 based on active color, only need to encode file

Naive worst case 200 bits

  • Two kings placements - 12 bits
  • Board
    • QRRBBNN QQQQQQQQ - 75 bits
    • qrrbbnn qqqqqqqq - 75 bits
    • Empty squares - 30 bits
  • Active color - 1 bit
  • Castling - 4 bits
  • En Passant - 3 bits

Actual worst case

Each player cannot promote all pawns without capturing other pieces. Here is the entropy effect of piece capture:

  • PpR (3+3+5 = 11 bits) => Qq_ (5+5+1 = 11 bits)
  • PPpp (3+3+3+3 = 12 bits) => QQq_ (5+5+5+1 = 16 bits)

So actually the worst case board is:

  • QRRBBNN QQQQQQQQ - 75 bits
  • qrrbbnn qqqq - 55 bits
  • Empty squares - 34 bits

Worst case is to promote all pieces rather than leave pawn for en passant.

TOTAL ACTUAL WORST CASE WITH SHOWN CODE 12+75+55+34+1+4 = 181 bits

FIQ shows two improvements to this simple scheme, but they are harder to code:

  • Remove bit 2 from piece encoding on rows 1 and 8 since pawns can't go there (up to 16 bit savings)
  • Use pawns to encode castable rooks (4 bit savings)

The only remaining code not shown in this answer (for brevity) is: breaking the input FEN in fields (split /\s/) and variable assignment.

William Entriken

Posted 2014-01-25T23:06:56.263

Reputation: 175

A player can promote all his pawns (Given that he is able to capture enemy pawns); Qn4QQ/Qb6/Qq1k4/Qr6/Qb6/Qr6/Qn4NK/RNB2B1R b - - 0 84 – Krzysztof Szewczyk – 2019-10-13T13:05:42.133

@KrzysztofSzewczyk, yes that is noted above at PPpp => QQq_ – William Entriken – 2019-10-15T19:11:01.550

4

Total data needs 33 bytes

(Thanks to someone in the comments I realised this doesn't work for pawn promotion. Will update this when I can solve it)

for the first byte we use five bits:

  • first bit: player's turn, 1=white
  • second bit: black king side castle, 1=can castle
  • third bit: black queen side castle, 1=can castle
  • fourth bit: white king side castle, 1=can castle
  • fifth bit: white queen side castle, 1=can castle

the next 32 bytes are used to represent each chess piece, in a predefined order

  • 3-bits: represent row
  • 3-bits: represent column
  • 1-bit: represent en-passant, 1=can en-passant
  • 1-bit: if "can en-passant" bit is 1: indicates which side, 0=left
    else represent whether it has been captured. 0=not captured
    (If it can en-passant then it is definitely not captured)

Some C code to represent this idea (which doesn't actually work)

int main() {
    char b, c[32], i;

    //decode:

    FILE *p=fopen("/path/to/file.csv","r");
    fscanf(p,"%d,",&b);
    for(i=0;i<31;i++) fscanf(p,"%d,",&c[i]);
    fscanf(p,"%d",&c[31]);
    fclose(p);
    if(b&16) /* white's turn */
    else /* black's turn */
    if(b&8) /* black king side can castle */
    if(b&4) /* black queen side can castle */
    if(b&2) /* white king side can castle */
    if(b&1) /* white queen side can castle */

    for(i=0;i<32;i++) {
        int row, column;
        row=c[i]&7;
        column=c[i]&56;
        if(c[i]&64 && isPawn(c[i])) { //can en-passant
            if(c[i]&128) //can en-passant to the right
            else //can en-passant to the left
        }
        if(!(c[i]&64)) {
            if(c[i]&128) //captured
            else //not captured
        }
    }

    //encode:

    p=fopen("/path/to/file.csv","w");

    if(b&16) b&=239;
    else b|=16;
    if(black_king_side_cannot_castle) b&=247;
    if(black_queen_side_cannot_castle) b&=251;
    if(white_king_side_cannot_castle) b&=253;
    if(white_queen_side_cannot_castle) b&=254;

    for(i=0;i<32;i++) {
        c[i]=row;
        c[i]+=column*8;
        if(isPawn(c[i]) && can_en_Passant) {
            c[i]|=64;
            if(can_en_Passant_left) c[i]&=127;
            else c[i]|=128;
        }
        if(!(c[i]&64)) {
            if(isCaptured(c[i])) c[i]|=128;
            else c[i]&=127;
        }
    }
    fprintf(p,"%d,",b);
    for(i=0;i<31;i++) fprintf(p,"%d,",c[i]);
    fprintf(p,"%d",c[31]);
    fclose(p);
    return 0;
}

user12205

Posted 2014-01-25T23:06:56.263

Reputation: 8 752

-1, this is not an answer... – Doorknob – 2014-01-25T23:43:19.943

1

Your predefined order won't be of much use when a pawn reaches the eighth rank and is promoted to a knight, bishop, rook or queen.

– r3mainer – 2014-01-25T23:43:37.410

@Doorknob of Snow ok I'm working on the algorithm, it's just a bit late at night and I'm tired, so I'm doing it slightly slower – user12205 – 2014-01-25T23:50:42.610

Well then why did you post this if you don't even have a solution? – Doorknob – 2014-01-25T23:52:01.697

@Doorknob of Snow Sorry it's because my roommate is turning off the lights and I can hardly see my keyboard, so I'm just posting it here, and hoping to continue some time later. if you insist then i'll continue now – user12205 – 2014-01-26T00:01:18.267

@ace save it on your computer, delete this, then come back later. (That is the best solution to this problem that I have come up with) – Justin – 2014-01-26T00:14:41.160

@Quincunx fine i've done what i need to. (except for the pawn promotion part. sorry i wasn't quite familiar with chess rules) – user12205 – 2014-01-26T00:23:45.757

@Doorknob of Snow i've fixed it now, are you willing to cancel the down vote? – user12205 – 2014-01-26T00:28:59.210

Why would you even post code that doesn't work.... :/ Wait until you're finished, and then post it. Why do you want to post an incomplete answer? – Doorknob – 2014-01-26T00:29:35.930

@Doorknob of Snow because the question says "algorithm or program"... I just wrote the encode and decode part, and i did not define the isPawn() and isCaptured() functions. and I don't want to write all the parts to ask for inputs and move the chess pieces and present possible moves to the user, because the question says "algorithm or program that can encode and decode a chess board", it only encodes/decodes. i mean if i write it in pseudocode to present the algorithm then it would definitely not work, right? – user12205 – 2014-01-26T00:41:49.987

3if you people still think this answer is crap and has no value here then go ahead and vote to delete it and downvote it and you may as well delete my account here. i just want to share what i could think of, why do you guys have to be so mean? – user12205 – 2014-01-26T00:51:54.530

Well, at least it was a try. I'll upvote for an attempt, not that bad finally. – VisioN – 2014-01-26T01:26:47.223

4

256 242 bits

Here's a basic compression algorithm that can probably be improved on because it doesn't exclude certain illegal positions from being represented.

The board starts with 5 bits of header info, as follows:

0 1 1 1 1
---------
1 2 3 4 5

1: Turn (black = 1, white = 0)
2: Black can castle queen-side
3: Black can castle king-side
4: White can castle queen-side
5: White can castle king-side

Then, a string of 12 bits representing the positions of the kings.

0 0 0 1 0 0 1 1 1 1 0 0
-----------------------
0 0 0 0 0 0 0 0 0 1 1 1
1 2 3 4 5 6 7 8 9 0 1 2

01 - 03: white king's rank
04 - 06: white king's file
07 - 09: white king's rank
10 - 12: white king's file

Then, a huge 64-digit number in base 11, which is then multiplied by 9 to add another digit at the end representing the state of en-passant. Each digit in base 11 represents a square on the board, with the following possible values:

0: empty

1: white pawn
2: white knight
3: white bishop
4: white rook
5: white queen

For the black equivalent of each white piece, add 5.

And the digits in base 9:

0: no en-passant possible
1 - 8: en-passant on rank 1 - 8

1164 × 9 is about 2224.57, which requires 225 bits to encode. Plus the 17 header bits at the top, is 242 bits total.


Thanks to ugoren for the improvements.

Joe Z.

Posted 2014-01-25T23:06:56.263

Reputation: 30 589

This is very similar to ace's algorithm, but it uses board positions instead of pieces in a predetermined order, which both excludes it from having the pawn promotion problem and allows the data to be chopped off a bit. – Joe Z. – 2014-01-26T02:48:03.527

You can save en-passant as one number between 0 and 8 (on which column can the current player capture en-passant?). 13^64 * 9 is 239.99, saving 11 bits. Save more by encoding king positions separately. – ugoren – 2014-01-26T13:15:05.840

According to Doorknob of Snow who commented on my post, this kind of answer is "not an answer". Just saying. FYI his comment was posted before I added the C code to my answer. – user12205 – 2014-01-26T16:03:49.247

@ugoren: I forgot about that. You're right, I forgot that only one pawn can be en-passant at the same time. – Joe Z. – 2014-01-26T18:34:36.603

@ace: Your answer isn't invalid because it doesn't include encoding and decoding code; it's invalid because it doesn't account for the pawn-promotion case (in which case your predefined order of pieces doesn't do anything). The problem, at its core, asks for a data-encoding scheme. The program is just something to interface with that. – Joe Z. – 2014-01-26T18:59:47.700

3

? bits

(≥ 217 worst-case, 17 best-case, 179 for initial board)


Encoding description

Extra metadata consists of whose turn it is (one bit) and castling (four bits, i.e. may white castle on kings' side? on queens' side? and similarly for black).

For the board position itself, we encode it as a set of active pieces. Well, actually, we make sure to enumerate the pieces in a particular order for reasons that I'll explain in a bit. For each piece we store its colour (one bit), its kind (three bits to encompass 6 kinds, plus one extra kind for "pawn that may be taken by en passant"), as well as its position.

Here's the interesting part: to encode the position of a piece, instead of storing it as a coordinate, we store its relative distance from the last piece when enumerating the pieces in left-to-right, top-to-down order (i.e. A8, B8, ..., G1, H1). In addition, we store the distance as a variable-length number, using 1 to mean that this piece is right next to the previous, 0xx for skipping 1-3 pieces, 000xxx for skipping 4-10 pieces, 000000xxxx for 11-25, 0000000000xxxxx for 26-56 and finally 000000000000000xxx for 57-62.

Examples

I made a gist of some positions coded with this encoding, and annotated the one for the initial position with some comments.

I don't know how to analyse the worst-case bit size, but going by the examples in the gist, I believe that the algorithm should be somewhat efficient at least.


Implementation of decoder

Below is a quick-and-dirty decoder for this encoding (taking as input the binary data encoded in text, as in the above annotated example, and skipping over things that aren't '0' or '1'). Produces a unicode chess board to stdout.

#include <assert.h>
#include <stdbool.h>
#include <stdio.h>
#include <string.h>

char buf[1024];
int wi = 0, ri = 0;

int read_n(int n) {
  int res = 0;
  for (int i = 0; i < n; i++) {
    res = res << 1 | (buf[ri++] == '1');
  }
  return res;
}

int read_varnum() {
  int v, c = 0;

  for (int i = 1; i <= 5; i++) {
    v = read_n(i);
    if (v != 0) return c + v;
    c += (1 << i) - 1;
  }

  assert(false); /* Shouldn't happen */
}

char *piece_to_str(int piece, int color) {       /* ↓ pawn that may be taken with en passant */
  char *pieces[] = { "♙", "♘", "♗", "♖", "♕", "♔", "♙",
                     "♟", "♞", "♝", "♜", "♛", "♚", "♟" };
  return pieces[color * 7 + piece];
}

int main(void) {
  int ch;
  while (ch = getchar(), ch != EOF) {
    if (ch == '0' || ch == '1') buf[wi++] = ch;
  }

  int board[64];
  memset(board, -1, 64 * sizeof(int));

  /* Read metadata */
  int is_white = read_n(1);
  int castling = read_n(4);

  /* Read the board state */
  int bi = -1;
  while (ri != wi) {
    int color = read_n(1);
    int kind  = read_n(3);
    int delta = read_varnum();
    board[bi + delta] = color << 8 | kind;
    bi += delta;
  }

  /* Print metadata */
  printf("  Current turn: %s's turn to play.\n", is_white? "white" : "black");
  printf("  Castling: White may castle? %s %s\n",
         castling & 0x8? "left" : "", castling & 0x4? "right" : "");
  printf("            Black may castle? %s %s\n",
         castling & 0x2? "left" : "", castling & 0x1? "right" : "");
  printf("\n");

  /* Print the board out */
  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  for (int y = 0; y < 8; y++) {
    printf("|");
    for (int x = 0; x < 8; x++) {
      int piece = board[y*8 + x],
          color = piece >> 8,
          kind  = piece & 0xFF;

      if (piece == -1) printf("  ");
      else printf(" %s", piece_to_str(kind, color));
    }
    printf(" |\n");
  }

  printf("+");
  for (int x = 0; x < 8; x++) printf("--");
  printf("-+\n");

  return 0;
}

FireFly

Posted 2014-01-25T23:06:56.263

Reputation: 7 107

I have no idea what your worst-case bit count is, but I suspect it's considerably more than 179 bits. For example, how would your algorithm handle this layout? (It is valid, by the way; I made it using the editor at chess.com)

– r3mainer – 2014-01-26T21:16:57.393

@squeamishossifrage that one seems to require 217 bits: https://gist.github.com/FireyFly/8639791 (thanks for the test-case, I wanted to try it with a trickier one!)

– FireFly – 2014-01-26T21:37:37.940

Hey, that's not bad. Keep going! – r3mainer – 2014-01-26T21:41:40.147

2

Max: 184 bits, Min: 75 bits

I was inspired by @AdamSpeight's Huffman coding for pieces to try crafting my own scheme. I doubt this will win, but it does have calculable limits.

This scheme treats the chess pieces as though there are 11 different types of pieces. I approximately followed the Huffman coding algorithm to group these classes by their frequencies and real types to generate the following encodings:

 Piece Class                | Frequency Per Team | Encoding
============================+====================+==========
 Pawn, normal               | 0 - 8              | 0
 Pawn, jumped two last turn | 0 - 1              | 1111
 Knight                     | 0 - 2              | 1000
 Bishop                     | 0 - 2              | 1001
 Queen-side rook, unmoved   | 0 - 1              | 10100
 Queen-side rook, moved     | 0 - 1              | 10101
 King-side rook, unmoved    | 0 - 1              | 10110
 King-side rook, moved      | 0 - 1              | 10111
 Queen                      | 0 - 1              | 1110
 King, unmoved              | 0 - 1              | 1100
 King, moved                | 0 - 1              | 1101

Each piece's code can be preceded by two bits to show to which team it belongs (10 for White, 11 for Black). 0 can be used to encode empty spaces. These ideas allow us to build a square-by-square encoding for the whole board using whatever procedure we want. I will assume the iteration order a1, b1, c1, ... f8, g8, h8. This means that just listing these codes as shown above encodes all the information except whose turn it is. A very simple chess engine can use the "classes" for the pawns, rooks, and kings to determine whether castling and en passant are legal. Furthermore, this scheme easily handles pawn promotions. If a player promotes a pawn to a rook, either the king or queen-side codes may be used so long as the "moved" variant is chosen.

Excepting pathological positions that I do not think could ever happen, the worst-case scenario for this encoding is when all pieces are still on the board and the previous player moved a pawn forward two spaces. As an example, the board below encodes 1. e4:

1101010010100010100110111010110010100110100010101101001001001100100100100000000000000101111000000000000000000011011011011011011011011011101001110001110011111101111001110011110001110110
===========================================================================
                              Black's move
1110100 111000 111001 111110 111100 111001 111000 1110110 | r n b q k b n r
    110    110    110    110    110    110    110     110 | p p p p p p p p
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
      0      0      0      0 101111      0      0       0 | . . . . P . . .
      0      0      0      0      0      0      0       0 | . . . . . . . .
    100    100    100    110      0    100    100     100 | P P P P . P P P
1010100 101000 101001 101110 101100 101001 101000 1010110 | R N B Q K B N R

This means that the worst-case encoding has 184 bits: 1 for indicating the player, 32 for the empty spaces, 45 for the unmoved pawns, 6 for the two-space-jumping pawn, 24 for the knights, 24 for the bishops, 28 for the rooks, 12 for the queens, and 12 for the kings.

As pieces moved about and are captured, the size of the encoding will drop. The best case scenario is represented by two kings alone on the board: 1 bit for indicating the player + 62 bits for indicating the empty squares + 12 bits for indicating the kings gives a minimum length of 75 bits.

I will come back and edit this response with some code that demonstrates this encoding in action later today or tomorrow. For now, I am curious to see what other people think of this encoding.

sadakatsu

Posted 2014-01-25T23:06:56.263

Reputation: 619

1You can save bits on the rooks. You can determine queen- and king-side based on position, and you don't need to know which side a moved-rook came from. so you just need one bit of information on the rook - moved or unmoved. – Not that Charles – 2015-07-08T15:35:04.350

... And actually, storing that data on a piece level is easier to encode/decode, but if you just stored a marker at the start, you may save bits overall worst case (e.g., the marker 11001 means B'S MOVE W CAN KSC W CANT QSC B CANT KSC B CAN QSC). That is 4 bits instead of the 5 bits per side in your encoding, or 3 bits per side if you eliminate the side marker on the rooks. (KSC = King's-side castle. QSC = Queen's-side castle) – Not that Charles – 2015-07-08T15:40:49.050

Also, due to promotions, it is quite possible to have up to 9 Queens or 10 Knights (or any other non-regal, non-pawn piece) – Not that Charles – 2015-07-08T15:43:04.697

1

184 bits = 23 bytes worst case, and not too complicated:

A. Which squares occupied: 64 bits = 8 bytes B. Which colors for the <=32 occupied squares: 32 bits = 4 bytes And now using only the <=30 squares occupied by non-kings: B. use PNBRQ encoded in radix 5, for 5^30 possibilities; and 32*31*2*4*4*9 for king positions, mover-color, white & black castling rights, en passant square (among 8 possibilities plus none, a 9th); this number 5^30 * 32 * 31 * 2 * 4*4*9 = 266075134277343750000000000 = 2^87.782 fits in 88 bits for a total of 64+32+88=184 bits for the whole encoding.

This can be reduced, e.g. 32*31*2*4*4*9=285696 can be reduced to 2*(32*31+31*3+31*3+3*3)*6=14244, by using fact at most 6 en passant victim-candidates (including none), and encoding castling rights and king positions inside same set using fact castling rights only matter when king on initial square. This saves 4 bits.

I have reason to believe that it is not possible to achieve 18 bytes, i.e. the total number of legal chess positions exceeds 2^144.

Your 160-bit scheme is very ingenious but I think it needs to be given as encode/decode programs before I'll be completely confident about it.

Warren D. Smith

Posted 2014-01-25T23:06:56.263

Reputation: 11

1

171 bits worst case:

I've combined a couple of ideas, as well as some of my own thoughts.

We are going to start with a 64 bit board. Each bit represents an occupied space on the board. They fill along rows. So the start looks like: 1111 1111 1111 1111 0000 0000 0000 0000 0000 0000 0000 0000 1111 1111 1111 1111

Now, each piece will be represented by 4 bits. 1st bit: color (0=white, 1=black) 2nd-4th bits: One of 8 types.

0=king, 1=queen, 2=bishop0, 3=knight, 4=rook, 5=pawn0, 6=pawn1, 7=bishop1

At the end we will include a bit designating the turn. 0=white, 1=black.

4bits*32 pieces=128 bits and I've already got 64+1 from turn and board. That gives a total of 128+64+1=193 I haven't even started with en passant or castling. Well over my limit with nothing -- not even turns. This is where the tricks start.

Okay -- you see those types above? Bishop0 and Bishop1? Pawn0 and pawn1? Those types are so designated, because they tell us a bit to insert after this piece. So Bishop0 means that after it, there will be a 0 -- i.e that the next piece is white. Bishop1 tells us the next piece is black, and the pawn0 and pawn1 work the same. (If this piece is the last piece being enumerated, then it tells us about the next turn instead).

My worst case involves all the pieces on the board, so with 16 pawns and 4 bishops, this saves me 20 bits. I'm down to 173.

Okay. For another bit in my worst case -- once there are 16 of a color encoded, we stop encoding color -- as we know it going forwards. My worst case now involves a white piece making it to the far corner with no captures. There, I save only one bit. 172.

I'm going to now change the order in which I name the pieces. We will name them along columns starting at the outside moving in. The board named at the beginning will stay the same, but when I place pieces on it, i start from whites bottom right, and go up that column. Then I jump to black's bottom right, and go up that column. I jump to white's bottom right unknown cell, and so forth -- this means my worst case is again the start. My reason has to do with my en passant trick, the next two bits I lose, and castling.

Now, for a pawn to be promoted (as has been discussed at length) a piece must be captured. Thus, when we know there are 32 pieces, we only need to denote 31 of them. The last piece is uniquely identified. As it turns out, for me, this only saves 2 bits -- because my last piece might be a pawn/bishop (which normally costs me 3 bits because I save one on the next piece) who's color is determined already and so was only 2 bits. I'm down to 170 bits.

When pawns get promoted, they simply change their type. For each piece that goes off the board I rid myself of (minimum) 3 bits, and two pawn promotions cost me 2 bits, so I'm declining (slowly) in promotions.

Castling. For castling to happen, neither king nor relevant rook may have moved. Thus, when a rook is capable of castling both it and the king will be in their original places. So, rooks capable of castling will be listed in the same place as kings of that color. As this is illegal (two kings or three kings of the same color on the board) and only can happen in three possible setups for each color (left, right, both rooks are listed as kings) -- decoding is utterly possible. So, for no bits we have encoded castling.

En Passant Here we will also use illegal positions. Only one pawn can be in danger of en passant at a time. It must be in its fourth row. The pawn that is vulnerable will be 'flipped' back to its home row -- switched with whatever is there. As that pawn is in its own first row -- an illegal position, the situation will be obvious to the decoder -- it will reverse the positions, and mark the pawn as vulnerable to en passant.

We needed to muck about with the order because the last piece has to be 'uniquely identified'. In a standard order, we wouldn't be able to tell if the rook in the back corner could castle -- that's not known. When we work from the outside in, we guarantee that wherever that rook is 'listed' be it in its own corner, or in the middle of the board because it was switched with an en passant vulnerable pawn, there will be a piece listed after it -- so we are told the rook's type. We know there will be a piece after it because, for a pawn to be vulnerable there must be a pawn to its inside (which will crucially be encoded later as per the instructions above).

Oh, and to help make sure that the worst case involves all pieces on the board, once we have less than 32 pieces, we can use the 64 bit board to identify turn as well as occupied positions by using 0's to represent pieces when its white's turn and 1's when it is blacks turn.

So we got en passant and castling for free. We picked up the last piece for free, though it took some finagling with order to make that play nice with the en passant and castling rules. We ditched 20 bits on the standard pieces. I believe the worst case here involves a midde white pawn moved forward and a black piece in between it and its queen while all pieces are on the board. Quick double check: first piece is captured -- call it a pawn, 3 bits off the board in the pawn, 3 bits on the board in the form of a last piece, one bit off in the turn marker disappearing. Promote two pawns 2 bits back on the board. Ah damn, I'm at 171.

EDIT I've added code for a (working?) decoder -- in R -- below. It takes vectors of booleans as input -- (sorry -- I'm not capable of coding well in anything that would let me actually use the bits) I've also included the start position.

separate = function(vec){
    #Using a boolean vector (sorry R doesn't handle bits well and this will build quickest)
    board = matrix(vec[1:64],8,8,byrow=T)
    npieces = min(sum(board),64-sum(board))
    n = length(vec)
    a = vec[65:n]
    counter = 0
    pieces = list()
    white = 0
    Letters=c(letters,LETTERS)
    for (i in 1:npieces){
        col = ifelse(a[1],"black",{white=white+1;"white"})
        typ = a[2:4]
        a=a[-(1:4)]
        num = 4*typ[1] + 2*typ[2] + typ[3]
        type = switch(letters[num+1],a="king",b="queen",c="knight",d="rook",e="bishop0",f="bishop1",g="pawn0",h="pawn1")
        if (num > 3) {
            if(num%%2){
                a = c(T,a)
            } else {
                a = c(F,a)
            }
            type = substr(type,1,nchar(type)-1)
        }
        pieces[[Letters[i]]] = list(color=col,type=type)
        if (length(pieces)==31&&npieces==32) {
            col = ifelse(16-white,{white=white+1;"white"},"black")
            type = "TBD"
            pieces[[Letters[i+1]]] = list(color=col,type=type)
            break
        }
    }

    if (npieces==32) {
        f=function(y){sum(sapply(pieces,function(x)x$type==y))}
        if (f("pawn")<16) {pieces[[32]]$type="pawn"}
        if (f("bishop")<4) {pieces[[32]]$type="bishop"}
        if (f("knight")<4) {pieces[[32]]$type="knight"}
        if (f("queen")<2)  {pieces[[32]]$type="queen"}
        if (f("king")<2)   {pieces[[32]]$type="king"}
        if (f("rook")<(6-f("king"))) {pieces[[32]]$type="rook"}
    }
    return(list(board,pieces,turn=ifelse(a[length(a)],"black","white")))
}


fillboard = function(out) {
    board = out[[1]]
    pieces = out[[2]]
    turn = out[[3]]
    lpieces = lapply(pieces,function(x) paste(substr(x$color,1,1),x$type))
    game = matrix("     ",8,8)
    #Start with corners.
    a = c(1,57,8,64)
    #Then kings
    b = c(25,32)
    #Then rooks in en passant
    c = c(4,60,5,61)
    #Then kings in en passant
    d = 28:29
    exceptions = list(a,b,c,d)
    for (places in exceptions) {
        c= which(board[places])
        if (length(c)) {
            repl = lpieces[1:length(c)]
            game[places[c]] = unlist(repl)
            board[places] = F
            lpieces = lpieces[-(1:length(c))]
        }
    }
    #Loop through rows.
    for (i in c(1:4,8:5)) {
        j = which(board[i,])
        if (length(j)) {
            repl = lpieces[1:length(j)]
            game[i,j] = unlist(repl)
            board[i,j] = F
            lpieces = lpieces[-(1:length(j))]
        }
    }
    return(matrix(unlist(game),8,8,F))
}

swapillegal = function(matr) {
    mat = matr
    if (any(mat[8,]=="b pawn")) {
        j = which(mat[8,]=="b pawn")
        mat[8,j] = mat[5,j]
        mat[5,j] = "b pawn-e"
    }
    if (any(mat[1,]=="w pawn")) {
        j = which(mat[1,]=="w pawn")
        mat[1,j] = mat[4,j]
        mat[4,j] = "w pawn-e"
    }

    if (sum(mat[8,]=="b king") > 1) {
        j = which(mat[8,-4]=="b king")
        j[j==7] = 8
        mat[8,j] = "b rook-c"
    }
    if (sum(mat[1,]=="w king") >1) {
        j = which(mat[1,-4]=="w king")
        j[j==7] = 8
        mat[1,j] = "w rook-c"
    }
    return(mat)
}

decode = function(vec) {
    a = separate(vec)
    b = fillboard(a)
    c = swapillegal(b)
    list(board=c,turn=a[[3]])
}


startboard = c(rep(T,16),rep(F,32),rep(T,16))
#Re-ordering -- first spots will be the corners. Then kings. then en passant positions of those spots
pieces = c(F,F,T,T,F,F,T,T,T,F,T,T,T,F,T,T,F,F,F,F,T,F,F,F,F,F,T,F,F,T,F,F,F,F,T,F,T,F,F,F,T,F,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,F,T,T,T,F,T,F,T,T,F,T,F,F,T,T,T,F,T,F,T,F,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,T,F)
########## w rook -w rook -B rook -B rook -W king -B king -w kni  -w bi 0 -w Q  -w Bi 0 -w kni-w p0   - 2   -   3 -  4  - 5   -  6  -  7  -w p1 -b kni-b bi1  -b q  -b bi1  -b kni-b p1   -2    - 3   - 4   - 5   - 6   - b p0- implicit b p0.
#After the kings come all the prior en passant positions. which are empty at start. Then we start at whites bottom right, and move across rows to middle. Then we go to blacks bottom left, and fill across to the middle.
#Changing start to properly encode rooks for castling
newpieces= c(F,F,F,F,F,F,F,F,T,F,F,F,T,F,F,F ,pieces[-(1:16)])
test2 = decode(c(startboard,newpieces))

This code builds 4 functions. One that does some basic separation of the pieces and the board structure, as well as figuring out piece type and any 'implicit' pieces. The next function fills in the board structure with those pieces in a slightly bizzarre (and different from my initial algorithm's) order [explained in code comments]. The next function takes the filled in board and detects any illegal positions -- it then fixes them and renames pawns that are vulnerable to en passant "x pawn-e" and any rooks that can castle "x rook-c". The final function is a wrapper that runs those functions in order and gives an output which is the current board as well as the turn.

I've also included the encoding of the start position (though to see it you will have to call c(startboard,newpieces) and the code calls the wrapper function on that position.

user5957401

Posted 2014-01-25T23:06:56.263

Reputation: 699

This is interesting. I'd love to see a working implementation as a proof of concept. – Mego – 2016-08-09T08:59:59.060

Implementation is usually a few steps beyond proof of concept, but I've added a decoder. It is in R and thus uses Booleans rather than bits (sorry -- didn't want to use too much time). But I believe it should be proof of concept. – user5957401 – 2016-08-09T19:41:10.033

0

229 / 226 bits

This turns out not to be very successful, but might save other people going down the same path.

The simple version:

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • 3 bits for the en passant possibilities. This has more depth that I at first understood. En passant must be done by a pawn moving from the same rank (row) as the pawn which is captured. Case analysis indicates that once we know how many pawns of the colour which last moved have advanced exactly two squares, there will be at most 6 en passant cases (including the case that there is no pawn vulnerable to en passant). The worse case is 5 pawns (potentially all vulnerable: e.g. PpPPpPPp has five vulnerable Ps). With 6 pawns there are at most 2 enemy pawns in the same rank, each of which can threaten at most 2 pawns en passant. Therefore we need ceil(lg 6) = 3 bits here.

Then the board. The board has 64 squares, so a square index can be encoded in 6 bits. We list the men by rank, alternating colours, starting with the king.

  • 6 bits: position of white king. (Guaranteed to be on the board).
  • 6 bits: position of black king. (Guaranteed to be on the board. In the worst case that the white king is in a corner, there are 60 possible places he could be; in the best case that the white isn't on an edge, there are 55).
  • 6 bits: position of white queen. If there is no white queen, repeat the white king's position as a signal.
  • For each additional white queen, a 1 bit followed by 6 bits for the position.
  • A 0 bit.
  • Ditto for black queen(s).
  • Similar process for rooks, bishops, knights, and pawns, although we can skip the pawns for a colour if we already have 16 men of that colour accounted for.
  • Delete the final 0 bit.

This costs a certain 12 bits for the kings, and 2*7*5-1 = 69 bits for the other men. In the worst case that all 32 men are on the board, it costs 7 bits for each man other than the kings, for a total of 12 + 7*30 - 1 = 221 bits. So with the initial 8 bits for global state we have 229 bits.


The advanced version:

By using arithmetic coding we can operate with lg num_possibilities rather than ceil(lg num_possibilities) and just take one ceil at the end.

  • 1 bit for whose turn it is
  • 4 bits for the four castling possibilities
  • lg 6 bits for the en passant possibilities.
  • 6 bits for the white king
  • lg 60 bits (worst case) for the black king
  • lg 63 bits (because I don't want to complicate it to the level of looking at checks) for the white queen, using the white king's position if there is none
  • For each additional white queen, a 1 bit followed by lg 62, lg 61, etc. bits for her position.
  • A 0 bit.
  • lg 63 bits (or fewer, if there were any white queens) for the black queen.
  • etc.

The nth man who's actually present has 66-n possible values. If a type is absent for a colour, we spent 66-#men so far bits in recording that (plus one bit for the separator). The extreme cases are:

  1. All men present, including at least one unpromoted pawn from each side. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, and SUM_{n=32}^{63} lg n bits on positions. Grand total: ceil(225.82) bits. Disappointing.
  2. Only unpromoted pawns left. We spend 5+lg 6 on global state, 6+lg 60 on the kings, 29 on separator bits, 8*lg 63 saying that there are no other pieces, and SUM_{n=48}^{63} lg n on positions of the pawns. Grand total: ceil(188.94) bits. Better - by saving the second rook, knight, and bishop for each side we've pulled ahead a bit.

So the worst case seems likely to be 226 bits, for a measly saving of 3.

We can definitely do better in the average case by encoding pawns before pieces, since they're restricted to 48 squares rather than the full 64. However, in the worst case that all men are on the board and all pawns have been promoted I think this would end up costing 2 bits more because we would need a "no pawns" flag rather than being able to count the men.

Peter Taylor

Posted 2014-01-25T23:06:56.263

Reputation: 41 901

0

This is a discussion topic in Chess circles.

Here is one very simple proof with 164 bits https://groups.google.com/forum/#!topic/rec.games.chess.computer/vmvI0ePH2kI 155 is shown here http://homepages.cwi.nl/~tromp/chess/chess.html

Over simplified strategy:

  • Limit pawns to the place where pawns may be found
  • Limit armies to consider original pieces and possible pawn promotions
  • Think hard about promotions and situations where promotions are not possible

William Entriken

Posted 2014-01-25T23:06:56.263

Reputation: 175

1

This is an example of why you include the content in your answer. The 2nd link is dead. Is this it? http://tromp.github.io/chess/chess.html

– mbomb007 – 2016-10-04T19:54:54.170

2Link-only answers are not good answers. You should provide the complete answer within your post, not just some links to the answer. And if your answer is not your own work you should probably make your post a community wiki. – user12205 – 2014-04-19T18:56:51.337

Post is original. Adding details to answer. – William Entriken – 2014-04-23T15:38:11.043

-2

Min: 0 bits

Max: 1734 243 bits (4.335 4.401 bits/board amortized)

Expected: 351 177 bits (4.376 4.430 bits/board amortized)

Since I can determine the input and output however I want I decided to go with encoding the history of the game up until this point. One advantage is that the additional information of who's turn it is, en-passant, and who has the ability to castle where can be derived and not encoded.

Attempt 1:

Naively I thought that I can encode each move in 12 bits, 4 triplets of the form (start x, start y, end x, end y) where each is 3 bits.

We would assume the starting position and move pieces from there with white going first. The board is arranged such that ( 0 , 0 ) is white's lower left corner.

For example the game:

  e4    e5
 Nf3    f6
Nxe5  fxe5
...    ...

Would be encoded as:

100001 100010 100110 100100
110000 101010 101110 101101
101010 100100 101101 100100
...

This leads to an encoding of 12 m bits where m is the number of moves made

On one hand this could get really big, on the other hand you can consider each move to be it's own game so each encoding really encodes m "chess boards". If you amortized this you get that each "chess board" is 12 bits. But I feel this is a bit cheating...

Attempt 2:

I realized that each move in the previous attempt encodes many illegal moves. So I decided to only encode legal moves. We enumerate possible moves as follows, number each square such that ( 0 , 0 ) → 0 , ( 1 , 0 ) → 1 , ( x , y ) → x + 8 y . Iterate through the tiles and check if a piece is there and if it can move. If so add the positions it can go to to a list. Choose the list index that is the move you want to make. Add that number to the running total of moves weighted by 1 plus the number of possible moves.

Example as above: From the starting position the first piece that can move is the knight on square 1, it can move to square 16 or 18, so add those to the list [(1,16),(1,18)]. Next is the knight on square 6, add it's moves. Overall we get:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 12 , 28 ) , we encode this as 13 in base 20 since there are 20 possible moves.

So now we get the game number g0 = 13

Next we do the same for black except we number the tiles in reverse (to make it easier, not required) to get the list of moves:

[(1,16),(1,18),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 11 , 27 ) , we encode this as 11 in base 20 since there are 20 possible moves.

So now we get the game number g1 = ( 11 ⋅ 20 ) + 13 = 233

Next we get the following list of moves for white:

[(1,16),(1,18),(3,12),(3,21),(3,30),(3,39),(4,12),(5,12),(5,19),(5,26),(5,33),(5,40),(6,12),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(11,19),(11,27)(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move ( 6 , 21 ) , we encode this as 13 in base 29 since there are 29 possible moves.

So now we get the game number g2 = ( ( 13 ⋅ 20 ) + 11 ) 20 + 13 = 5433

Next we get the following list of moves for black: [(1,11),(1,16),(1,18),(2,11),(2,20),(2,29),(2,38),(2,47),(3,11),(4,11),(4,18),(4,25),(4,32),(6,21),(6,23),(8,16),(8,24),(9,17),(9,25),(10,18),(10,26),(12,20),(12,28),(13,21),(13,29),(14,22),(14,30),(15,23),(15,31)]

Since we want the move $(10, 18)$ ( 10 , 18 )

So now we get the game number g3 = ( ( ( 19 ⋅ 29 + 13 ) 20 ) + 11 ) 20 + 13 = 225833

And continue this process for all the remaining moves. You can think of g as the function g ( x , y , z ) = x y + z . Thus g0 = g ( 1 , 1 , 13 ) , g1 = g ( g ( 1 , 1 , 11 ) , 20 , 13 ) , g2 = g ( g ( g ( 1 , 1 , 13 ) , 20 , 11 ) , 20 , 13 ) , g3 = g ( g ( g ( g ( 1 , 1 , 19 ) , 29 , 13 ) , 20 , 11 ) , 20 , 13 )

To decode a game number g0, we start at the initial position and enumerate all possible moves. Then we compute g1 = g0 // l, m0 = g0 % l, where l is the number of possible moves, '//' is the integer division operator and '%' is the modulus operator. It should hold that g0 = g1 + m0. Next we make the move m0 and repeat.

From the example above if g0 = 225833 then g1 = 225833 // 20 = 11291 and m0 = 225833 % 20= 13. Next g2 = 11291 // 20 = 564 and m1 = 11291 % 20 = 11. Then g3 = 11291 // 20 = 564 and m2 = 11291 % 20 = 11. Therefore g4 = 564 // 29 = 19 and_m_3 = 564 % 29 = 13. Finally g5= 19 // 29 = 0 and m4 = 19 % 29 = 19.

So how many bits are used to encode a game this way?

For simplicity, let's say there are always 20 moves each turn and for the worst case scenario we always pick the biggest one, 19. The number we will get is 19 ⋅ 20m

+ 19 ⋅ 20m-1 + 19 ⋅ 20m-2 + ⋯ + 19 ⋅ 20 + 19 = 20m+1 − 1 where _m is the number of moves. To encode 20m+1 − 1 we need about log2 ( 20m+1 ) bits which is about ( m + 1 ) ∗ log2 ( 20 ) = 4.3219 ∗ ( m + 1 )

On average m = 80 (40 moves per player) so this would take 351 bits to encode. If we were recording many games we would need a universal encoding since we don't know how many bits each number will need

Worst case when m = 400 (200 moves per player) so this would take 1734 bits to encode.

Note that the position we want to encode must be given to us via the shortest path to get there by following the rules. For example the game theorized here doesn't need m = 11741 to encode the final position. Instead we run a Breadth-First search to find the shortest path to that position and encode that instead. I don't know how deep we would need to go to enumerate all chess positions, but I suspect that 400 is an overestimate.

Quick calculation:

There are 12 unique pieces or the square can be empty so to position them on a chessboard is 1364. This is a vast overestimate since it includes many invalid positions. When we are m moves into the game we have created about 20m positions. So we are looking for when 20m = 1364. Log both sides to get m = 64 * log20(13) = 54.797. This shows that we should be able to get to any position in 55 moves.

Now that I calculated the worst case to be m = 55 not m = 400 I will edit my results. To encode a position where m = 55 takes 243 bits. I'm going to also say that the average case is around m = 40 which takes 177 bits to encode.

If we use the amortization argument from earlier we are encoding 400 "chess boards" in 1734 bits so we get that each "chess board" takes up 4.335 bits in the worst case.

Note that g = 0 denotes a valid game, the one where the piece on the lowest square moves to the lowest square it can.

Additional Notes:

If you want to refer to a specific position in the game you may need to encode the index. This can be added either manually e.g concatenate the index to the game, or add an additional "end" move as the last possible move each turn. This can now account for players conceding, or 2 in a row to denote the players agreed to a draw. This is only necessary if the game did not end in a checkmate or stalemate based on the position, in this case it is implied. In this case it brings the number of bits needed on average to 356 and in the worst case 1762.

edggy

Posted 2014-01-25T23:06:56.263

Reputation: 227

1

Your “amortization” argument does not work. The goal is to encode a board given by the user. You do not get to divide the cost of encoding this board among the 400 auxiliary boards you might happen to generate along the way. (This would only be okay if those 400 boards were allowed to be independently chosen by the user, and not constrained by the requirement to form a chess game.) Also, a chess game can theoretically take many thousands of moves, and the OP is clear about being interested in the worst case.

– Anders Kaseorg – 2017-07-16T18:32:16.603

@AndersKaseorg: Very true. It really depends on the objective. If you are trying to store an entire game with all of the other algorithms it would take m*c bytes where m is the number of moves and c is the size of their output. So if you are trying to store an entire 80 move game using the 160 bit solution it would take 12800 bits while mine would only take 351. For the sake of the competition I admit that you are correct, but I thought it was a nice property to point out since it is very common to want to store entire games and not just boards. – edggy – 2017-07-16T18:44:01.490

The objective is unambiguous. “The goal is to make the smallest representation of a chessboard that could be used (once decoded) to determine all movement possibilities for a player on that turn. … Your score is determined worst-case scenario - maximum possible size in bits.” – Anders Kaseorg – 2017-07-16T18:49:52.393

@AndersKaseorg: I never claimed it was ambiguous. I just decided to take a different approach. One thing to note is in order to find the smallest board using my algorithm I need the smallest path to get to this position. For example in the 11741 turn game that you linked to get to the same board position I don't need to follow that path if all we care about is the board. So in order to encode the linked game I just find the shortest game which is left with 2 kings on those squares which may only be 200 turns or less. This can be done with a depth-first search. – edggy – 2017-07-16T19:01:10.890

Using a shorter equivalent game is fine, if you can actually prove that every position is reachable in 200 turns or less (right now this looks like just a guess). You’ll also need to account for chess positions with more than 100 legal moves, unless you can prove that they don’t arise within your construction. Dividing your result by the number of moves in the game is still not allowed by the rules of this challenge.

– Anders Kaseorg – 2017-07-16T19:56:29.460

@AndersKaseorg: I edited my answer to clarify the amortized result and I did some approximations on how many moves it would take to cover all possible positions. Take a look at the "Quick calculation" section to see my thoughts. – edggy – 2017-07-16T20:06:54.790

Now you have me wondering what position actually requires the largest number of moves to minimally reach. A candidate is this one, which purportedly requires 183 moves (by each player). Anyway, remember that we’re looking for a worst case upper bound, not an approximation, and you still haven’t addressed the comments above.

– Anders Kaseorg – 2017-07-16T20:40:18.640