I've read tons of web hits related to this issue, and I still haven't come across any definitive answer.

What I'd like to do is to make a database of chess positions, capable of identifying transpositions (generally which pieces are on which squares).

EDIT: it should also be capable to identify similar (but not exactly identical) positions.

This is a discussion almost 20 years ago (when space was an issue): https://groups.google.com/forum/#!topic/rec.games.chess.computer/wVyS3tftZAA

One of the discussants talk about encoding pieces on a square matrix, using 4 x 64 bits plus some bits more for the additional information (castling, en passant etc): there are six pieces (Pawn, Rook, Knight, Bishop, Queen, King) plus an empty square, that would be 3 bits (2^3), and one more bit for the color of the piece.

In total, there would be 4 numbers of 64bits each, plus some additional information.

Question: is there any other, more efficient way of storing a chess position?

I should probably mention this question is database centric, not game centric (i.e. my sole interest is to efficiently store and retrieve, not to create any AI or to generate any moves).

Thanks, Adrian

标签： database position chess

7条回答

唯我独甜

2楼-- · 2019-04-27 18:43

Take a loot at the Forsyth–Edwards Notation (FEN). It is described here. It is also well known and supported by many engines and chess programs.

Here is the FEN for the starting position:

rnbqkbnr/pppppppp/8/8/8/8/PPPPPPPP/RNBQKBNR w KQkq - 0 1

Fen is seperated in 6 segments.

Segment 1 contains the pieces. black pieces are in lower case, white pieces are in upper case.

Segment 2 states, who's turn is it. (w or b)

Segment 3 is for castling. KQkq means both can castle on both sides. K = King side white q = queen side black

Segment 4 En passant target square in algebraic notation. If there's no en passant target square, this is "-". If a pawn has just made a two-square move, this is the position "behind" the pawn. This is recorded regardless of whether there is a pawn in position to make an en passant capture

Segment 5 Halfmove clock: This is the number of halfmoves since the last capture or pawn advance. This is used to determine if a draw can be claimed under the fifty-move rule.

Segment 6 Fullmove number: The number of the full move. It starts at 1, and is incremented after Black's move.

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2019-04-27 18:48

If you do not need a decodable position representation for comparisons then you could look at Zobrist hashing. This is used by chess engines to produce a 64 bit oneway hash of a position for spotting transpositions in search trees. As it is a oneway hash you obviously cannot reverse the position from the hash. The size of the hash is tunable, but 64 bits seems to be the accepted minimum size that results in few collisions. It would be ideal as a database index key with a fixed length of just 8 bytes. As collisions, though infrequent, are possible you could do a second pass comparing the actual positions to filter out any positions that have hashed to the same value if it is a concern. I use Zobrist hashes in one of my own applications (using SQLite) that I use to manage my openings and it has no trouble in finding transpositions.

0人赞添加讨论(0) 举报

We Are One

4楼-- · 2019-04-27 18:48

Consider up to 32 pieces on the board. Each piece can be on one of the 64 squares. Representing the piece positions independendly in predetermined order requires 32*6=192 bits.

In addition to that each pawn can be promoted to a rook, bishop, knight or queen, so for each pawn we need to encode its state in 3 additional bits (4 possible piecetypes and normal pawn).

32*6+16*3 = 240 bit/ 30 byte

In many cases you will need additional information about the state of the game/variant the position arises in:

EnPassent File: 4 bits (8 files and none)

Castlerights: 4 bits (short/long white/black)

sideToMove : 1 bit (white/black)

which adds up to 249 bit/32 bytes.

This might not be the most compact representation, but it is easy to en-/decode.

0人赞添加讨论(0) 举报

萌系小妹纸

5楼-- · 2019-04-27 18:50

You could use a modified run length encoding where each piece is encoded as a piece number (3 bits), with 0y111 used to skip ahead spaces. As there are many situations where pieces are next to each other, you end up omitting the positional information:

         All pieces are followed by color bit
0y000c 0 Pawn
0y001c 1 Rook
0y010c 2 Knight
0y011c 3 Bishop
0y100c 4 Queen
0y101c 5 King
0y110 6 Empty space
0y111 7 Repeat next symbol (count is next 6 bits, then symbol)

The decoder starts off at a1 and proceed to the right, moving up at the end of a row, so the encoding for a starting board would be:

12354321      Literal white encoding from a1 to h1    32 bits
7 8 0         repeat white pawn 8 times               13 bits
7 32 6        repeat 32 empty spaces                  12 bits
7 8 8         repeat black pawn 8 times               13 bits
9abcdba9      Literal encoding of black               32 bits
                                                    ---------
                                                     102 bits total

That being said, the additional complexity and uncertainty of a variable length encoding is probably not worth the space savings. Further, it may be worse than a constant width format in certain plays.

0人赞添加讨论(0) 举报

\"骚年 ilove

6楼-- · 2019-04-27 18:51

There are 32 pieces on the board, and 64 squares. Square index can be represented with a 6-bit number, so to represent the locations of each piece you need 32 six-bit numbers, or a total of 192 bits, which is less than 4x64.

You can do a bit better by realizing that not all positions are possible (e.g. a pawn cannot reach the end row of its own color) and using less than six bits for the position in these cases. Also, a position already occupied by another piece makes that position unavailable for other pieces.

As a piece may also be totally missing from the board, you should start with the kings' positions, as they are always there - and then, encoding another piece's position as the same of a king would mean that the piece has been taken.

Edit:

A short analysis of the pieces' possible positions:

Kings, queens, knights and rooks can be anywhere on the board (64 positions)
Bishops are restricted to 32 positions each
Pawns are restricted to 21, 26, 30, 32, 32, 30, 26, and 21 positions (columns A-H).

Thus, this set of legal chess positions can be described trivially with an integer from zero up to (64^12 * 32^4 * 21^4 * 26^4 * 30^4 * 32^8)-1, or 391935874857773690005106949814449284944862535808450559999, which fits into 188 bits. Encoding and decoding a position to and from this is very straightforward - however, there are multiple numbers that decode into the same position (e.g. white knight 1 at B1 and white knight 2 at G1; and white knight 1 at G1 and white knight 2 at B1).

Due to the fact that no two pieces can occupy the same square, there is a tighter limit but it is a bit difficult to both encode and decode, so probably not useful in a real application. Also, the number shown above is pretty close to 2^188, so I don't think even this tighter encoding would fit into 187 bits.

0人赞添加讨论(0) 举报

走好不送

7楼-- · 2019-04-27 18:58

My two cents

Short version

Choose data format that makes it easy to count the similarity of two positions.
Store positional data near the searching program (possibly in memory).
Brute force the search through all the positions when searching similar positions.
Possibly divide the search to multiple threads/processes.

Longer version

32 bytes (4*64 bits) is quite small amount of data. 1000 million chess positions could fit in to 30 gigabytes. 192 bits is 24 bytes this would make in to 23 gigabytes. Probably database use some kind of compression and thus the in disk might be less than these figures. I don't know what kind of limits there are for storage, but because these seems quite tight encodings it might not be worth the effort try to minimize more.

Because ability to find similar positions was required I think the encoding should make it easy to compare different positions. Preferably this could be counted without decoding. For this to work the encoding should probably be constant length (can't think easy way to do this with variable length coding).

Indexing might speed up similarity search. Naive approach would be index all the positions by piece locations in database. This would make 32 indexes (and maybe for additional information also). It would make the search lightning fast at least in theory.

Indexes are going to take quite much space. Probably more than the actual positional data. And still they might not help that much. For example finding positions where black king is in, or near e4 required 9 searches using the index and then hopping around the 30 gigabytes of positional information which is likely need disk access in random locations. And probably finding similar positions is done for more than one piece...

If the storage format is efficient it might just be enough to brute force (like this)through all positional data and check the similarity position by position. This will use the CPU caches efficiently. Also because of the constant length record it is easy to divide the work to multiple processors or machines.

Whether to use piece-centric or board-based storage format depends on how you are going to calculate the similarity of two positions compared to each others. Piece-centric gives easy way to calculate distance of one piece in two different positions. However in piece-centric approach every piece is identified separately thus it is not so easy to find a pawn in certain location. One has to check every pawns location. If the piece identity is not so important, then board-based storage makes it easy to just check if a pawn is in wanted location. On the other hand it is not possible to check which exact pawn there is.

0人赞添加讨论(0) 举报

1 2 下一页

efficient storage of a chess position

Short version

Longer version

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间