Finding the best move using MinMax with Alpha-Beta

2019-03-20 14:47发布

问题:

I'm working on an AI for a game and I want to use the MinMax algorithm with the Alpha-Beta pruning.

I have a rough idea on how it works but I'm still not able to write the code from scratch, so I've spend the last two days looking for some kind of pseudocode online.

My problem is that every pseudocode I've found online seems to be based on finding the value for the best move while I need to return the best move itself and not a number.

My current code is based on this pseudocode (source)

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) alpha = score
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

As you can see, this code returns a number and I guess that this is needed to make everything work (since the returned number is used during the recursion).

So I thought that I may use an external variable to store the best move, and this is how I've changed the previous code:

minimax(level, player, alpha, beta){  // player may be "computer" or "opponent"
    if (gameover || level == 0)
       return score
    children = all valid moves for this "player"
    if (player is computer, i.e., max's turn){
       // Find max and store in alpha
       for each child {
          score = minimax(level - 1, opponent, alpha, beta)
          if (score > alpha) {
              alpha = score
              bestMove = current child // ROW THAT I ADDED TO UPDATE THE BEST MOVE
          }
          if (alpha >= beta) break;  // beta cut-off
       }
       return alpha
    } else (player is opponent, i.e., min's turn)
       // Find min and store in beta
       for each child {
          score = minimax(level - 1, computer, alpha, beta)
          if (score < beta) beta = score
          if (alpha >= beta) break;  // alpha cut-off
       }
       return beta
    }
}

// Initial call with alpha=-inf and beta=inf
minimax(2, computer, -inf, +inf)

Now, this is how it makes sense to me, because we need to update the best move only if it's player's turn and if the move is better than the previous.

So, while I think that this one's correct (even if I'm not 100% sure), the source has also a java implementation which updates the bestMove even in the score < beta case and I don't understand why.

Trying with that implementation led my code to choose as best move a move from the oppositing player, which doesn't seem to be correct (assuming that I'm the black player, I'm looking for the best move that I can make so I'm expecting a "black" move and not a "white" one).

I don't know if my pseudocode (the second one) is the correct way to find the best move using MinMax with alpha-beta pruning or if I need to update the best move even in the score < beta case.

Please feel free to suggest any new and bettere pseudocode if you prefer, I'm not bound to anything and I don't mind rewriting some code if it's better than mine.

EDIT:

Since I can't understand the replies, I guess that maybe the question doesn't ask what I want to know so I'm trying to write it better here.

Provided that I want to get the best move only for one player and that this player, which is the maximizer, is passed to the MinMax function everytime that I need a new move (so that minmax(2, black, a, b) returns the best move for the black player while minmax(2, white, a ,b) returns the best one for the white player), how would you change the first pseudocode (or the java implementation in the source) to store this given best move somewhere?

EDIT 2:

Let's see if we can make it work this way.

This is my implementation, can you please tell me if it's correct?

//PlayerType is an enum with just White and Black values, opponent() returns the opposite player type
protected int minMax(int alpha, int beta, int maxDepth, PlayerType player) {        
    if (!canContinue()) {
        return 0;
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    int value = 0;
    boolean isMaximizer = (player.equals(playerType)); // playerType is the player used by the AI        
    if (maxDepth == 0 || board.isGameOver()) {
        value = evaluateBoard();
        return value;
    }
    while (movesIterator.hasNext()) {
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        value = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) {
            if (value > alpha) {
                selectedMove = currentMove;
                alpha = value;
            }
        } else {
            if (value < beta) {
                beta = value;
            }
        }
        if (alpha >= beta) {
            break;
        }
    }
    return (isMaximizer) ? alpha : beta;
}

EDIT 3:

New implementation based on @Codor's answer/comments

private class MoveValue {
    public Move move;
    public int value;

    public MoveValue() {
        move = null;
        value = 0;
    }

    public MoveValue(Move move, int value) {
        this.move = move;
        this.value = value;
    }

    @Override
    public String toString() {
        return "MoveValue{" + "move=" + move + ", value=" + value + '}';
    }

}

protected MoveValue minMax(int alpha, int beta, int maxDepth, PlayerType player) {
    if (!canContinue()) {
        return new MoveValue();
    }
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    MoveValue moveValue = new MoveValue();
    boolean isMaximizer = (player.equals(playerType));
    if (maxDepth == 0 || board.isGameOver()) {            
        moveValue.value = evaluateBoard();
        return moveValue;
    }
    while (movesIterator.hasNext()) {
        Move currentMove = movesIterator.next();
        board.applyMove(currentMove);
        moveValue = minMax(alpha, beta, maxDepth - 1, player.opponent());
        board.undoLastMove();
        if (isMaximizer) {
            if (moveValue.value > alpha) {
                selectedMove = currentMove;
                alpha = moveValue.value;
            }
        } else {
            if (moveValue.value < beta) {
                beta = moveValue.value;
                selectedMove = currentMove;
            }
        }
        if (alpha >= beta) {
            break;
        }
    }
    return (isMaximizer) ? new MoveValue(selectedMove, alpha) : new MoveValue(selectedMove, beta);
}

I don't know if I got it right or if I did something wrong, but I'm back to the problem I had when I posted the question:

calling minMax(Integer.MIN_VALUE, Integer.MAX_VALUE, 1, PlayerType.Black) returns a move that can be done only by the white player and this is not what I need.

I need the best move for the given player, not the best move for the whole board.

回答1:

After some research and a lot of time wasted solving this problem, I came up with this solution that seems to work.

private class MoveValue {

    public double returnValue;
    public Move returnMove;

    public MoveValue() {
        returnValue = 0;
    }

    public MoveValue(double returnValue) {
        this.returnValue = returnValue;
    }

    public MoveValue(double returnValue, Move returnMove) {
        this.returnValue = returnValue;
        this.returnMove = returnMove;
    }

}


protected MoveValue minMax(double alpha, double beta, int maxDepth, MarbleType player) {       
    if (!canContinue()) {
        return new MoveValue();
    }        
    ArrayList<Move> moves = sortMoves(generateLegalMoves(player));
    Iterator<Move> movesIterator = moves.iterator();
    double value = 0;
    boolean isMaximizer = (player.equals(playerType)); 
    if (maxDepth == 0 || board.isGameOver()) {            
        value = evaluateBoard();            
        return new MoveValue(value);
    }
    MoveValue returnMove;
    MoveValue bestMove = null;
    if (isMaximizer) {           
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue < returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue > alpha) {
                alpha = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = beta;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    } else {
        while (movesIterator.hasNext()) {
            Move currentMove = movesIterator.next();
            board.applyMove(currentMove);
            returnMove = minMax(alpha, beta, maxDepth - 1, player.opponent());
            board.undoLastMove();
            if ((bestMove == null) || (bestMove.returnValue > returnMove.returnValue)) {
                bestMove = returnMove;
                bestMove.returnMove = currentMove;
            }
            if (returnMove.returnValue < beta) {
                beta = returnMove.returnValue;
                bestMove = returnMove;
            }
            if (beta <= alpha) {
                bestMove.returnValue = alpha;
                bestMove.returnMove = null;
                return bestMove; // pruning
            }
        }
        return bestMove;
    }   
}


回答2:

This is a bit diffuclt as the given code is not an actual Java implementation; in order to achieve what you want, there must be concrete types to represent a move and position in the game tree. Usually the the game tree is not explicitly encoded but navigated in a sparse representation where the implementation would actually perform the move in question, evaluate the resulting smaller problem recursively and undo the move, thus using depth-first search by using the call stack so represent the current path.

To obtain the actual best move, simply return the instance from your method which maximizes the subsequent evaluation. It might be helpful to first implement the Minimax algorithm without alpha-beta-pruning, which is added in a subsequent steps after the basic structure works.

The implementation from the link in the question (Section 1.5) actually returns the best move, as indicated in the following comment taken from there.

/** Recursive minimax at level of depth for either
    maximizing or minimizing player.
    Return int[3] of {score, row, col}  */

Here no user-defined type is used to represent the move, but the method returns three values, which are the evaluated best score and the coordinates to which the player would move to actually perform the best move (which the implementation already has done to obtain the score), which are a representation of the actual move.