Needed advice for an enjoyable AI in Buraco card g

i’m trying to build an effective AI for the Buraco card game (2 and 4 players).

I want to avoid the heuristic approach : i’m not an expert of the game and for the last games i’ve developed this way i obtained mediocre results with that path.

I know the montecarlo tree search algorithm, i’ve used it for a checkers game with discrete result but I’m really confused by the recent success of other Machine Learning options.

For example i found this answer in stack overflow that really puzzles me, it says : "So again: build a bot which can play against itself. One common basis is a function Q(S,a) which assigns to any game state and possible action of the player a value -- this is called Q-learning. And this function is often implemented as a neural network ... although I would think it does not need to be that sophisticated here.”

I’m very new to Machine Learning (this should be Reinforcement Learning, right?) and i only know a little of Q-learning but it sounds like a great idea: i take my bot, making play against itself and then it learns from its results… the problem is that i have no idea how to start! (and neither if this approach could be good or not).

Could you help me to get the right direction? Is the Q-learning strategy a good one for my domain? Is the Montecarlo still the best option for me? Would it work well in a 4 players game like Buraco (2 opponents and 1 team mate)? Is there any other method that i’m ignoring?

PS: My goal is to develop an enjoyable AI for a casual application, i can even consider the possibility to make the AI cheating for example by looking at the players hands or deck. Even with this, ehm, permission i would not be able to build a good heuristic, i think :/

Thank you guys for your help!

EDIT: thinking again about your knowledge in MCTS and your implementation of checkers, I think my answer is formulated too pedagogical -- you probably know much of this if you know MCTS. Nevertheless, I hope it helps. Feel free to ask specific questions in the comments.

I was the one to suggest you to open a new question, so I fell I should at least give an answer here. However, right at the beginning, I want to clarify about what should be expected here. The things you are asking for are quite complex and require some solid experience in the realization. I had the occasion to implement optimal strategies for quite a few games, and I'd consider this here still a heavy challenge. So I think the goal here is to get some overview -- and not a proper solution (which I couldn't give anyhow as I don't know the game of Buraco).

As stated in the other answer, the theory of reinforcement learning provides a solid foundation. A good introduction is the book of Sutton and Barto with the same title. It's really readible and I'd suggest to get through the first five chapters or so (these cover Dynamic Programming and Monte Carlo). Two points in which this book is not sufficient are: (i) it does not give explicit application to two-player games, and (ii) it mainly relies on tabular representations of the value functions (--no neural networks and the like are involved).

A elementary part of the implementation is the state S of the game. This state should capture the complete current status of the game as compact and non-redundant as possible. Further, for each state S, you need to be able to assign the available actions A (like taking a card). Moreover, depending on the method you want to apply, it helps to know the probability distribution p(S'|S,A) which gives the probability to end up in state S' when you are in state S and do action A. And finally, you need to specify the reward r(S'|S,A) when you are in state S and do the action A to end in S' (for two player zero-sum games the reward can be chosen quite easily: when S' is the state where you win, you get +1 as reward, if you lose -1, otherwise 0 -- this however does not hold for games with more players or non-zero-sum games).

From a given state S, you further can derive the reduced states S_P1, S_P2, etc. which the single players see. Those states capture the information of a specific player (which is of course less than the complete state -- a player does not know the cards of the opponent, and also not the order of cards in the deck, for example). These reduced states provide the base on which players make their decisions. So, the goal for player 1 is to get a function Q_1(S_P1, A) which tells him: when I'm in state S_P1, I should at best do the action A (this is the idea of Q-learning). And the same holds for the other players. These functions have to be trained so that optimal results come out.

The way to do so is through the central equation of reinforcement learning, the Bellman equation. There are several solution methods like value iteration, Monte Carlo (this is basically the method you referenced in your link), temporal difference methods, and so on. Some of these methods can be interpreted as your digital players which repeatedly play against each other and in this process hopefully get better each. However, this is not guaranteed, it is easy to get stuck in local minima like in any optimization problem. At best you already have a good digital player at hand (from somewhere) and you train only one player to beat him -- in this way you reduce the two-player problem to a single-player problem which is much easier.

I'll make a cut here, because this stuff is really better to grasp from the book, and because I know this answer won't help you practically even if it were ten pages long. Here one last suggestion how to proceed: (i) get the foundations from the book, (ii) implement some toy problems therein, (iii) extend the stuff to two player games -- see also the minimax theorem for this, (iv) solve some more easy games (like tic-tac-toe, even this takes long), (v) get acquainted with neural networks and all this stuff, and (vi) tackle Buraco -- that's a tough program, however.