How to create a good evaluation function for a gam

2019-01-31 09:25发布

站内文章 / 后端开发

39 0

戒情不戒烟

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I write programs to play board game variants sometimes. The basic strategy is standard alpha-beta pruning or similar searches, sometimes augmented by the usual approaches to endgames or openings. I've mostly played around with chess variants, so when it comes time to pick my evaluation function, I use a basic chess evaluation function.

However, now I am writing a program to play a completely new board game. How do I choose a good or even decent evaluation function?

The main challenges are that the same pieces are always on the board, so a usual material function won't change based on position, and the game has been played less than a thousand times or so, so humans don't necessarily play it enough well yet to give insight. (PS. I considered a MoGo approach, but random games aren't likely to terminate.)

Game details: The game is played on a 10-by-10 board with a fixed six pieces per side. The pieces have certain movement rules, and interact in certain ways, but no piece is ever captured. The goal of the game is to have enough of your pieces in certain special squares on the board. The goal of the computer program is to provide a player which is competitive with or better than current human players.

回答1:

Find a few candidates for your evaluation function, like mobility (# of possible moves) minus opponent's mobility, then try to find the optimal weight for each metric. Genetic algorithms seem to work pretty well for optimizing weights in an evaluation function.

Create a population with random weights, fight them against each other with a limited depth and turns, replace the losers with random combinations from the winners, shuffle, and repeat, printing out the population average after every generation. Let it run until you're satisfied with the result, or until you see a need to adjust the range for some of the metrics and try again, if it appears that the optimal value for one metric might be outside your initial range.

Late edit: A more accepted, studied, understood approach that I didn't know at the time is something called "Differential Evolution". Offspring are created from 3 parents instead of 2, in such a way that avoids the problem of premature convergence towards the average.

回答2:

I will start with some basics and move to harder stuff later.

Basic agent and a testing framework

No matter what approach you take you need to start with something really simple and dumb. The best approach for a dumb agent is a random one (generate all possible moves, select one at random). This will serve as a starting point to compare all your other agents. You need a strong framework for comparison. Something that takes various agents, allows to play some number of games between them and returns the matrix of the performance. Based on the results, you calculate the fitness for each agent. For example your function tournament(agent1, agent2, agent3, 500) will play 500 games between each pair of agent (playing the first/second) and returns you something like:

  x         -0.01       -1.484   |  -1.485
0.01          x         -1.29    |  -1.483
1.484       1.29          x      |  2.774

Here for example I use 2 points for a win, 1 point for draw scoring function, and at the end just summing everything to find the fitness. This table immediately tells me that agent3 is the best, and agent1 is not really different from agent2.

So once these two important things are set up you are ready to experiment with your evaluation functions.

Let's start with selecting features

First of all you need to create not a terrible evaluation function. By this I mean that this function should correctly identify 3 important aspects (win/draw/loss). This sounds obvious, but I have seen significant amount of bots, where the creators were not able to correctly set up these 3 aspects.
Then you use your human ingenuity to find some features of the game state. The first thing to do is to speak with a game expert and ask him how he access the position.
If you do not have the expert, or you even just created the rules of your game 5 minutes ago, do not underestimate the human's ability to search for patters. Even after playing a couple of games, a smart person can give you ideas how he should have played (it does not mean that he can implement the ideas). Use these ideas as features.
At this point you do not really need to know how these features affect the game. Example of features: value of the pieces, pieces mobility, control of important positions, safety, total number of possible moves, closeness to a finish.
After you coded up these features and used them separately to see what works best (do not hurry up to discard features that do not perform reasonable by itself, they might be helpful in conjunction with others), you are ready to experiment with combinations.

Building better evaluations by combining and weighting simple features. There are a couple of standard approaches.

Create an uber function based on various combinations of your features. It can be linear eval = f_1 * a_1 + ... f_n * a_n (f_i features, a_i coefficients), but it can be anything. Then instantiate many agents with absolutely random weights for this evaluation function and use genetic algorithm to play them agains each other. Compare the results using the testing framework, discard a couple of clear losers and mutate a couple of winners. Continue the same process. (This is a rough outline, read more about GA)
Use the back-propagation idea from a neural networks to back propagate the error from the end of the game to update the weights of your network. You can read more how it was done with backgammon (I have not written anything similar, so sorry for the shortness).

You can work without evaluation function! This might sound insane for a person who only heard about minimax/alpha-beta, but there are methods which do not require an evaluation at all. One of them is called Monte Carlo Tree Search and as a Monte Carlo in a name suggests it uses a lot of random (it should not be random, it can use your previous good agents) game plays to generate a tree. This is a huge topic by itself, so I will give you mine really high-level explanation. You start with a root, create your frontier, which you try to expand. Once you expand something, you just randomly go to the leaf. Getting the result from the leaf, you backpropagate the result. Do this many many times, and collect the statistics about each child of the current frontier. Select the best one. There is significant theory there which relates to how do you balance between exploration and exploitation and a good thing to read there is UCT (Upper Confidence Bound algorithm)

回答3:

I would look at a supervised machine learning algorithm such as reinforcement learning. Check out Reinforcement learning in board games. I think that will give you some good directions to look into.

Also, check out Strategy Acquisition for the Game Othello Based on Reinforcement Learning (PDF link) where given the rules of the game, a good "payoff function" can be learned. This is closely related to TD-Gammon ...

During training, the neural network itself is used to select moves for both sides ... The rather surprising finding was that a substantial amount of learning actually took place, even in the zero initial knowledge experiments utilizing a raw board encoding.

回答4:

If nobody understands the game yet, there's no way you can get a decent evaluation function. Don't tell me that standard alpha-beta with material count is good or even decent for chess or its variants (maybe losers' chess is an exception).

You could try neural networks with feedback or similar machine learning algorithms but they usually suck until they have tons of training, which in this case is probably not available. And even then, if they don't suck, you can't gain knowledge from them.

I think there's no way short of understanding the game the best you can and, for starters, leave the unknowns as random on the evaluation function (or just out of the picture until the unknowns become better known).

Of course, if you'd share more info about the game you could get better ideas from the community.

回答5:

As I understand it, you want a good static evaluation function to use at the leaves of your min-max tree. If so, it is best to remember that the purpose of this static evaluation function is to provide a rating as to how good that board is for the computer player. So is

f(board1) > f(board2)

then it must be true that board1 is better for the computer (it is more likely to eventually win) than in board2. Of course, no static function is ever completely correct for all boards.

So, you say that "The goal of the game is to have enough of your pieces in certain special squares on the board", so a first stab at f(board) would simply be to count the number of pieces the computer has on those special squares. You can then finesse it more.

Without knowing the specifics of the game its impossible to give better guesses. If you gave us the game rules I am sure the stackoverflow users would be able to come with tons of original ideas for such functions.

回答6:

While you could use various machine learning methods to come up with an evaluation function (TD-Learning, used in such projects such as gnubackgammon, is one such example), the results are definitely dependent on the game itself. For backgammon, it works really well, because the stochastic nature of the game (rolling dice) forces the learner to explore territory it may not want to do. Without such a crucial component, you will probably end up with an evaluation function which is good against itself, but not against others.

Since material difference may not be applicable, is the concept of mobility important -- i.e. how many possible moves you have available? Is controlling a certain area of the board usually better than not? Talk to the people who play the game to find out some clues.

While it's preferable to have as good of an evaluation function as you can, you also need to tune your search algorithm so you can search as deeply as possible. Sometimes, this is actually more of a concern, since a deep searcher with a medicore evaluation function can outplay shallow searches with a good evaluation function. It all depends on the domain. (gnubackgammon plays an expert game with a 1-ply search, for example)

There are other techniques you can use to improve the quality of your search, most importantly, to have a transposition table to cache search results to have sound forward pruning.

I highly recommend looking over these slides.

回答7:

You also need to be careful on your choice. If your algorithm does not have a known relation to the actual value, the standard AI functions will not work properly. To be valid, your evaluation function, or heuristic has to be the same as, or below the actual value consistently or it will guide your decisions in an odd way (which one could argue for chess, even though I think the standard points are fine).

What I typically do is find out what is capable and what is required. For some games, like sokoban, I have used the minimum number of box moves required to get one box (in isolation) from its current location to any of the goal locations. This is not an accurate answer for the number of required moves, but I think it is a pretty good heuristic since it can never overestimate and it can be pre-calculated for the entire board. When summing the score for a board it is just the sum of the values for each current box location.

In an artificial life simulation that I wrote to evolve pack hunting and pack defense, the scoring system that I used was only to guide evolution and not to perform any pruning. I gave each creature one point for being born. For each point of energy that they consumed in their life, I gave them one additional point. I then used the sum of the points of their generation to determine how likely each was to reproduce. In my case, I simply used the proportion of the total points of their generation that they had acquired. If I had wanted to evolve creatures that were great at evading, I would have scored down for getting points eaten off of them.

You should also be careful that your function is not too hard of a goal to hit. If you are trying to evolve something, you want to make sure the solution space has a decent slope. You want to guide the evolution in a direction, not just declare a victory if it happens to randomly hit.

Without knowing more about your game I would be hard pressed to tell you how to build a function. Are there clear values of something that indicate a win or a loss? Do you have a way of estimating a minimum cost to close the gap?

If you provide more information, I would be happy to try and provide more insight. There are lots of excellent books on the topic as well.

Jacob

回答8:

Take in mind that it's not nescessarily true that a decent evaluation function even exists. For this statement I assume that, an evaluation function has to be of low complexity (P).