TicTacToe minimax algorithm returns unexpected res

2019-06-18 15:12发布

问题:

In my method newminimax499 I have a minimax algorithm that utilizes memoization and alpha beta pruning. The method works normally for 3x3 games, however when I play 4x4 games I get strange, unexpected position choices for the computer. He still never loses, but he doesn't seem to be playing to win. To illustrate the problem here is a scenario from 2 games in 3x3 and 4x4. First here is a scenario from a 3x3 game where the player is X and makes the first move:

This isn't bad, in fact it's what one would expect the computer to do. Now take a look at a scenario from a 4x4 game. Again O is the computer and X starts:

As you can see, the computer is simply placing Os in a systematic order one after the other and only breaking that order to block X when it has a potential win. This is very defensive play, unlike what was seen in the 3x3 game. So why is the method behaving differently for 3x3 and 4x4?

Here is the code:

//This method returns a 2 element int array containing the position of the best possible 
//next move and the score it yields. Utilizes memoization and  alpha beta 
//pruning to achieve better performance. 
public int[] newminimax499(int a, int b){
    //int bestScore = (turn == 'O') ? +9 : -9;  //X is minimizer, O is maximizer
    int bestPos=-1;
    int alpha= a;
    int beta= b;
    int currentScore;
    //boardShow();
    String stateString = "";                                                
    for (int i=0; i<state.length; i++) 
        stateString += state[i];                        
    int[] oldAnswer = oldAnswers.get(stateString);                          
    if (oldAnswer != null) 
        return oldAnswer;
    if(isGameOver()!='N'){
        int[] answer = {score(), bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else{
        for(int x:getAvailableMoves()){
            if(turn=='X'){  //X is minimizer
                setX(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore<beta){
                    beta=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
            else {  //O is maximizer
                setO(x);
                //System.out.println(stateID++);
                currentScore = newminimax499(alpha, beta)[0];
                revert(x);
                if(currentScore>alpha){
                    alpha=currentScore;
                    bestPos=x;
                }
                if(alpha>=beta){
                    break;
                }
            }
        }
    }
    if(turn=='X'){
        int[] answer = {beta, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
    else { 
        int[] answer = {alpha, bestPos};                                    
        oldAnswers.put (stateString, answer);                                   
        return answer;
    }
}

Following are the other components and complementary methods needed for you guys to run the code. The fields and constructor used in my class State2:

private char [] state;  //Actual content of the board
private char turn;  //Whose turn it is
private Map<String,int[]> oldAnswers; //Used for memoization. It saves every state along with the score it yielded which allows us to stop exploring the children of a certain node if a similar node's score has been previously calculated. The key is the board state(i.e OX------X for example), the int array is a 2 element array containing the score and position of last placed seed of the state.  
private Map<Integer, int []> RowCol; //A mapping of positions from a board represented as a normal array to a board represented as a 2d array. For example: The position 0 maps to 0,0 on a 2d array board, 1 maps to 0,1 and so on.
private static int n;   //Size of the board
private static int stateID; //An simple incrementer used to show number of recursive calls in the newminiax49 method. 
private static int countX, countO; //Number of placed Xs and Os
private static int lastAdded; //Position of last placed seed
private char [][] DDState; //A 2d array representing the board. Contains the same values as state[]. Used for simplicity in functions that check the state of the board.

public State2(int n){
    int a=0;
    State2.n=n;
    state=new char[n*n];
    RowCol=new HashMap<Integer, int []>();
    countX=0;
    countO=0;
    //Initializing the board with empty slots
    for(int i = 0; i<state.length; i++){
        state[i]='-';
    }
    //Mapping
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            RowCol.put(a, new int[]{i, j});
            a++;
        }
    }
    a=0;
    DDState=new char[n][n];
    //Initializing the 2d array with the values from state[](empty slots)
    for(int i=0; i<n; i++){
        for(int j=0; j<n; j++){
            DDState[i][j]=state[a];
            a++;
        }
    }
    oldAnswers = new HashMap<String,int[]>();
}

Complementary methods:

getAvailableMoves, returns an array with the empty slots on the board(i.e. the possible next moves).

public int[] getAvailableMoves(){
    int count=0;
    int i=0;
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            count++;
    }
    int [] availableSlots = new int[count];
    for(int j=0; j<state.length; j++){
        if(state[j]=='-')
            availableSlots[i++]=j;      
    }
    return availableSlots;
}

isGameOver2(), simply checks the current state of the board for whether the game is over. returns a char 'X', 'O', 'D' and 'N' which stand for X won, O won, Draw, and Not gameover respectively.

public char isGameOver2(){
    char turnOpp;
    int count;
    if(turn=='X'){
        count=countO;
        turnOpp='O';
    }
    else {
        count=countX;
        turnOpp='X';
    }
    if(count>=n){ 
        for(int i=0; i<n; i++){
            if(DDState[i][RowCol.get(lastAdded)[1]]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check row for win
        for(int i=0; i<n; i++){
            if(DDState[RowCol.get(lastAdded)[0]][i]!=turnOpp)
                break;
            if(i==(n-1)){
                return turnOpp;
            }
        }

        //Check diagonal for win
        if(RowCol.get(lastAdded)[0] == RowCol.get(lastAdded)[1]){

            //we're on a diagonal
            for(int i = 0; i < n; i++){
                if(DDState[i][i] != turnOpp)
                    break;
                if(i == n-1){
                    return turnOpp;
                }
            }
        }

        //check anti diagonal 

        for(int i = 0; i<n; i++){
            if(DDState[i][(n-1)-i] != turnOpp)
                break;
            if(i == n-1){
                return turnOpp;
            }
        }

        //check for draw
        if((countX+countO)==(n*n))
            return 'D';

            }
    return 'N';
}

boardShow, returns a matrix display of the current state of the board:

public void boardShow(){
    if(n==3){
        System.out.println(stateID);
        for(int i=0; i<=6;i+=3)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]");
        System.out.println("***********");
    }
    else {
        System.out.println(stateID);
        for(int i=0; i<=12;i+=4)
            System.out.println("["+state[i]+"]"+" ["+state[i+1]+"]"+" ["+state[i+2]+"]"+" ["+state[i+3]+"]");
        System.out.println("***********");
    }   
}

score, is a simple evaluation function which returns +10 for an O win, -10 for an X win and 0 for a draw:

public int score(){
    if(isGameOver2()=='X')
        return -10;
    else if(isGameOver2()=='O')
        return +10;
    else 
        return 0;
}

The seed setters:

//Sets an X at a certain location and updates the turn, countX and lastAdded variables
public void setX(int i){
    state[i]='X';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='X';
    turn='O';
    countX++;
    lastAdded=i;
}

//Sets an O at a certain location and updates the turn, countO and lastAdded variables
public void setO(int i){
    state[i]='O';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='O';
    turn='X';
    countO++;
    lastAdded=i;
}

Revert, simply reverts a move. For example if an X has been placed in position 0 revert(0) sets a '-' in it's place and updates the variables changed by setX:

public void revert(int i){
    state[i]='-';
    DDState[RowCol.get(i)[0]][RowCol.get(i)[1]]='-';
    if(turn=='X'){
        turn = 'O';
        countO--;
    }
    else {
        turn = 'X';
        countX--;
    }
}

How the main method might look like:

public static void main(String[] args) {
    State2 s=new State2(4);
    int [] results=new int[2];
    s.setX(0);
    long startTime = System.currentTimeMillis();
    results=s.newminimax499(Integer.MIN_VALUE,Integer.MAX_VALUE);
    long endTime = System.currentTimeMillis();
    System.out.println("Score: "+results[0]+" Position: "+ results[1]);
    System.out.println("Run time: " + (endTime-startTime));
    s.boardShow();

}

回答1:

I'm not convinced that there's a bug here -- if O plays in one of the earlier positions, it gets forked, whereas if it plays in the center, it forces a draw. Presumably the 4x4 game is even harder to win/lose, so the computer indifferently chooses the first open square.

Below, 1 denotes the forced response by O, 2 denotes the forking move by X, and ? denotes a possible win location.

X|O|
-+-+-
2|X|?
-+-+-
?| |1

X| |O
-+-+-
X|2|?
-+-+-
1| |?

X|2|?
-+-+-
O|X|
-+-+-
 |?|1


回答2:

I gave up trying to get my head around the code - the room began to get way too noisy for my collegues.

However - I did put together my own solution - which I feel may be worth posting. It is not designed to be especially efficient - in fact I am still waiting for it to make it's first move in the 4X4 test but it does seem to do the right thing on a 3X3 (i.e. draw against itself and win against a stupid strategy).

It should be able to handle boards of unequal sizes. A diagonal is any line of squares reaching from one side of the board to the other using diagonal steps.

As soon as it finishes it's 4X4 run I'll post it's moves for comparison.

public class TicTacToe {

    /**
     * Strategies for players.
     */
    interface Strategy {

        /**
         * Think up a place to move.
         *
         * @param board - The current board position.
         * @return Where to move.
         */
        Optional<Board.Square> think(Board board, Player player);
    }

    /**
     * Stupid strategy just finds the first empty square and plays it.
     */
    static final Strategy stupid = (Board board, Player player) -> {
        List<Board.Square> empty = board.getEmpties();
        if (empty.size() > 0) {
            return Optional.of(empty.get(0));
        }
        return Optional.empty();
    };

    /**
     * Minimax strategy.
     */
    static final Strategy minimax = new Strategy() {
        // Divide by this at each depth step.
        private static final double FACTOR = 3;

        // Memoize each discovered best move.
        Map<Player, Map<String, Board.Place>> best = new HashMap<>();

        {
            // One map for each player.
            for (Player p : Player.values()) {
                best.put(p, new HashMap<>());
            }
        }

        @Override
        public Optional<Board.Square> think(Board board, Player player) {
            Optional<Board.Square> win = Optional.empty();
            // Seen this one before?
            Board.Place seen = best.get(player).get(board.toString());
            if (seen == null) {
                double winValue = Double.NEGATIVE_INFINITY;
                for (Board.Square play : board.getEmpties()) {
                    double value = value(board, player, play, 1);
                    if (value > winValue) {
                        win = Optional.of(play);
                        winValue = value;
                    }
                }
                if (win.isPresent()) {
                    // Record my result.
                    best.get(player).put(board.toString(), win.get().place);
                } else {
                    System.out.println("No best found for " + board);
                }
            } else {
                // Reuse it.
                win = Optional.of(board.square(seen));
            }
            return win;
        }

        private double value(Board board, Player player, Board.Square move, double scale) {
            // My value.
            double value = 0;
            // Make the move.
            board.mark(player, move);
            try {
                // A win?
                if (!board.win()) {
                    // Remaining empties.
                    List<Board.Square> empties = board.getEmpties();
                    if (!empties.isEmpty()) {
                        // Record it's value.
                        double[] values = new double[empties.size()];
                        for (int i = 0; i < values.length; i++) {
                            // Try each further move negating and downscaling at each level.
                            values[i] = value(board, player.next(), empties.get(i), (scale / FACTOR) * -1);
                        }
                        // Add them up.
                        value += DoubleStream.of(values).sum();
                    }
                } else {
                    // Somebody already won.
                    value = scale;
                }
            } finally {
                //System.out.println("Value of " + board + " to player " + player + " = " + value);
                // Unroll the move.
                board.unmark(player, move);
            }
            return value;
        }
    };

    /**
     * There are exactly two players.
     */
    enum Player {

        O, X;

        /**
         * Each player has a strategy.
         *
         * Defaults to stupid.
         */
        private Strategy strategy = stupid;

        /**
         * What mark they make on the board.
         *
         * @return String representing the mark they make.
         */
        public char mark() {
            // The mark they make is the first character of their name.
            return name().charAt(0);
        }

        /**
         * Who is the next player.
         *
         * @return The other player.
         */
        public Player next() {
            return other(this);
        }

        /**
         * Who is the other player.
         *
         * @return The other player.
         */
        static Player other(Player it) {
            return it == O ? X : O;
        }

        public Strategy getStrategy() {
            return strategy;
        }

        public Player setStrategy(Strategy strategy) {
            this.strategy = strategy;
            return this;
        }
    }

    /**
     * The board.
     */
    static class Board {

        /**
         * Top left corner is 0,0. Access is [y][x].
         */
        final Square[][] board;
        // The board as columns.
        final List<List<Square>> columns;
        // The dagonals.
        final List<List<Square>> diagonals;
        // For sense checking.
        final int width;
        final int height;
        // Counts for each player - useful optimisation.
        final int[] counts = new int[Player.values().length];

        // A clear square.
        private static final char Clear = ' ';

        public Board(int width, int height) {
            this.width = width;
            this.height = height;
            board = new Square[height][];
            for (int y = 0; y < height; y++) {
                board[y] = new Square[width];
                // Fill it.
                for (int x = 0; x < width; x++) {
                    board[y][x] = new Square(x, y);
                }
            }
            // Build my columns lists.
            columns = new ArrayList<>();
            for (int x = 0; x < width; x++) {
                List<Square> column = new ArrayList<>();
                for (int y = 0; y < height; y++) {
                    column.add(board[y][x]);
                }
                columns.add(column);
            }
            // And the diagonals.
            diagonals = new ArrayList<>();
            for (int y = 0; y <= height - width; y++) {
                List<Square> diagonal = new ArrayList<>();
                // Left to right.
                for (int x = 0; x < width; x++) {
                    diagonal.add(board[y + x][x]);
                }
                diagonals.add(diagonal);
                // Right to left.
                diagonal = new ArrayList<>();
                for (int x = 0; x < width; x++) {
                    diagonal.add(board[y + x][width - 1 - x]);
                }
                diagonals.add(diagonal);
            }
        }

        public Board(int size) {
            this(size, size);
        }

        private Stream<List<Square>> asRows() {
            // Map each row to a list.
            return Arrays.stream(board).map(r -> Arrays.asList(r));
        }

        private Stream<List<Square>> asCols() {
            // Walk the columns.
            return columns.stream();
        }

        private Stream<List<Square>> asDiagonals() {
            // Walk the diagonals.
            return diagonals.stream();
        }

        private boolean winning(List<Square> row) {
            //System.out.println("winner(" + row + ")");
            if (row.isEmpty()) {
                return false;
            }
            Optional<Player> owner = row.get(0).owner;
            // First square owned.
            if (!owner.isPresent()) {
                return false;
            }
            return !row.stream()
                    //.peek(s -> System.out.println(s))
                    .anyMatch((s) -> (!s.owner.isPresent() || s.owner.get() != owner.get()));
        }

        public Square square(Place place) {
            return board[place.y][place.x];
        }

        private Optional<Player> getWinner() {
            Optional<Player> winner = Optional.empty();
            // Must be at least enough plays by one player to reach across/down the board.
            OptionalInt min = IntStream.of(counts).min();
            if (!min.isPresent() || min.getAsInt() < Math.min(width, height)) {
                // Nobody can posibly have won.
                return winner;
            }
            List<List<Square>> winners = asRows()
                    .filter(r -> winning(r))
                    .collect(Collectors.toList());
            if (winners.isEmpty()) {
                winners = asCols()
                        .filter(r -> winning(r))
                        .collect(Collectors.toList());
            }
            if (winners.isEmpty()) {
                winners = asDiagonals()
                        .filter(r -> winning(r))
                        .collect(Collectors.toList());
            }
            if (!winners.isEmpty()) {
                winner = winners.get(0).get(0).owner;
            }
            return winner;
        }

        private boolean isDrawn() {
            // All square occupied - counts add up to width*height.
            return IntStream.of(counts).sum() == width * height;
        }

        private boolean win() {
            return getWinner().isPresent();
        }

        /**
         * One square of the board.
         *
         * Remember that a Square is ON a board - i.e. it is a sub-object of a Board. Do not attempt to memoize Squares as they will hold a reference to their parent board.
         */
        class Square {

            // The owner of it.
            Optional<Player> owner = Optional.empty();
            // where it is on the board.
            final Place place;

            public Square(Place place) {
                this.place = place;
            }

            public Square(int x, int y) {
                this(new Place(x, y));
            }

            public char getMark() {
                return owner.isPresent() ? owner.get().mark() : Clear;
            }

            public boolean isTaken() {
                return owner.isPresent();
            }

            @Override
            public String toString() {
                return place + "(" + (owner.isPresent() ? owner.get().toString() : "") + ")";
            }

            private void take(Player player) {
                if (!isTaken()) {
                    owner = Optional.of(player);
                    // Keep track of the counts.
                    counts[player.ordinal()] += 1;
                } else {
                    throw new IllegalStateException("Attempt to take taken square!"
                            + " Square=" + this
                            + " Player=" + player
                            + " board=" + Board.this.toString()
                    );
                }
            }

            private void release(Player player) {
                if (isTaken()) {
                    if (owner.get() == player) {
                        owner = Optional.empty();
                        // Keep track of the counts.
                        counts[player.ordinal()] -= 1;
                    } else {
                        throw new IllegalStateException("Attempt to release other player's square!"
                                + " Square=" + this
                                + " Player=" + player
                                + " board=" + Board.this.toString()
                        );
                    }
                } else {
                    throw new IllegalStateException("Attempt to release untaken square!"
                            + " Square=" + this
                            + " Player=" + player
                            + " board=" + Board.this.toString()
                    );
                }
            }

        }

        private List<Square> getEmpties() {
            // Walk all squares.
            return Arrays.stream(board)
                    // Unroll each row.
                    .flatMap(l -> Arrays.stream(l))
                    // All not taken.
                    .filter(s -> !s.isTaken())
                    // As a list.
                    .collect(Collectors.toList());
        }

        /**
         * A place on the board.
         */
        class Place {

            final int x;
            final int y;

            public Place(int x, int y) {
                // Sense check.
                if (x < 0 || x >= width) {
                    throw new IllegalArgumentException("Off board: x = " + x);
                }
                if (y < 0 || y >= height) {
                    throw new IllegalArgumentException("Off board: x = " + x);
                }
                this.x = x;
                this.y = y;
            }

            @Override
            public String toString() {
                return "{" + x + "," + y + '}';
            }

        }

        /**
         * Mark a place on the board.
         *
         * @param player
         */
        public void mark(Player player, Square square) {
            // Take the square.
            square.take(player);
        }

        /**
         * UnMark a place on the board.
         *
         * Confirms the mark first.
         */
        public void unmark(Player player, Square square) {
            square.release(player);
        }

        @Override
        public String toString() {
            return Arrays.stream(board)
                    .map(r -> Arrays.stream(r)
                            .map(s -> String.valueOf(s.getMark()))
                            .collect(Collectors.joining("", "[", "]")))
                    .collect(Collectors.joining(""));
        }

    }

    class Game {

        // The current board state.
        final Board board;

        public Game(Board board) {
            this.board = board;
        }

        public Game(int size) {
            this(new Board(size));
        }

        public Game(int width, int height) {
            this(new Board(width, height));
        }

        private void play(Player firstPlayer) {
            boolean gameFinished = false;
            Player player = firstPlayer;
            do {
                Optional<Board.Square> move = player.getStrategy().think(board, player);
                if (move.isPresent()) {
                    board.mark(player, move.get());
                    System.out.println("Moved: " + move.get() + " -> " + board);
                } else {
                    System.out.println("No move!");
                }
                // Has anyone won?
                Optional<Player> winner = board.getWinner();
                if (winner.isPresent()) {
                    System.out.println("Player " + winner.get() + " won! " + board);
                    gameFinished = true;
                } else {
                    if (board.isDrawn()) {
                        System.out.println("Draw! " + board);
                        gameFinished = true;
                    }
                }
                // Next player.
                player = player.next();
            } while (!gameFinished);

        }

    }

    private void testStupid() {
        // Both start off stupid.
        new Game(3).play(Player.X);
    }

    private void testMinimax() {
        // Test minimax strategy.
        Board testBoard = new Board(3);
        // Set it up like http://neverstopbuilding.com/minimax
        testBoard.mark(Player.O, testBoard.board[0][0]);
        testBoard.mark(Player.X, testBoard.board[0][2]);
        testBoard.mark(Player.X, testBoard.board[1][0]);
        testBoard.mark(Player.X, testBoard.board[2][0]);
        testBoard.mark(Player.O, testBoard.board[2][1]);
        testBoard.mark(Player.O, testBoard.board[2][2]);
        // What does the strategy do?
        Optional<Board.Square> play = minimax.think(testBoard, Player.X);
        System.out.println("minimax(" + testBoard + ") = " + play);
    }

    private void test3X3Game() {
        // Play a game.
        // Change strategies.
        new Game(3).play(Player.X);
    }

    private void test4X4Game() {
        // Play a game.
        new Game(4).play(Player.X);
    }

    public void test() {
        testStupid();
        testMinimax();
        System.out.println("Test minmax vs stupid");
        Player.X.setStrategy(minimax);
        test3X3Game();
        System.out.println("Test minmax vs minmax");
        Player.O.setStrategy(minimax);
        test3X3Game();
        System.out.println("Test 4 X 4");
        test4X4Game();
    }

    public static void main(String args[]) {
        try {
            new TicTacToe().test();
        } catch (Throwable t) {
            t.printStackTrace(System.err);
        }
    }

}

When playing minimax against itself it moves:

Moved: {1,1}(X) -> [   ][ X ][   ]
Moved: {0,0}(O) -> [O  ][ X ][   ]
Moved: {2,2}(X) -> [O  ][ X ][  X]
Moved: {0,2}(O) -> [O  ][ X ][O X]
Moved: {0,1}(X) -> [O  ][XX ][O X]
Moved: {2,1}(O) -> [O  ][XXO][O X]
Moved: {1,0}(X) -> [OX ][XXO][O X]
Moved: {1,2}(O) -> [OX ][XXO][OOX]
Moved: {2,0}(X) -> [OXX][XXO][OOX]
Draw! [OXX][XXO][OOX]

Note that this os not the moves OP posted so there is clearly something wrong with OP's algorithm.