Playing poker poses a challenge that the above mentioned games don't - that of hidden information. Not knowing the opponent's cards introduces the element of bluffing, which is generally not something algorithms are good at. ![]() In many ways, multi player poker games resemble real life situations better than any other board game. In real life, there are usually more than two actors, there is hidden information to one or more of the adversaries, and it's not a zero-sum game. Many of the games in which AI was able to beat elite human players in the past, like Chess or checkers or Starcraft 2, are two-player zero-sum games. In these games, calculating and playing the Nash equilibrium guarantees that statistically (and in the long run), the player cannot lose, no matter what the opponent does. In the six player hold-em poker game, it's not generally possible to calculate the Nash equilibrium. Hidden information was addressed in Pluribus predecessor, Libratus, by combining a search procedure for imperfect-information games with self-play algorithms based on Counterfactual Regret Minimization. This technique worked for the two-player variation of the game, but could not scale to six players, even at 10,000 or more computing power. In contrast, what Pluribus does is play many games against itself to improve its strategies against earlier variations of itself in order to calculate a blueprint strategy.īy using iterative Monte Carlo CFR, the algorithm improves on the decisions of one player named the 'traverser' at each round, by examining all the available actions the traverser could have made, and what would be the hypothetical outcome. This is possible because the AI is playing against itself and so it can ask (itself) what would the non-traverser players had done in response to a different action taken by the traverser.Īt the end of each iteration of learning, the counterfactual regrets are updated for the traverser's strategy. To reduce complexity and storage requirements, some actions and similar decisions are bucketed together and treated as identical. The blueprint strategy calculated during training will be further improved during playing using a search strategy, but will not be adapted to observed tendencies of adversaries in real time. Training the algorithm took eight days on a 64-core server with less than 512GB RAM and without any need for GPUs. This is less than $150 in cloud computing costs at the time of publication. ![]() In contrast, training algorithms for other two-player zero-sum games would cost in the range of thousands to millions of dollars. In order to keep the blueprint strategy at a manageable size, the strategy is coarse grained.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |