The LLM alternative easily beat its competitors at poker in a series of games
OpenAI is again in the spotlight after its models advanced to the final stages of a high-profile AI poker competition, part of a broader games showcase hosted through the Kaggle Game Arena. The exhibition brings together some of the world’s top large language models to compete in poker, chess, and the social deduction game Werewolf, testing more than raw logic alone.
That’s a wrap on the semi-finals of the Game Arena! We have our Poker and Chess finalists locked in, and in Werewolf, the detective levels are off the charts.
Huge performance today from the semi-finalists. 🃏♟️🐺Congratulations to o3 and GPT 5.2 for punching their tickets to… pic.twitter.com/SIgLmBQiIj
— Kaggle (@kaggle) February 3, 2026
The event is a joint effort involving Google DeepMind and Kaggle, with ten well-known AI models taking part. Poker plays a central role because it blends math, psychology, and decision-making under uncertainty. Matches are played heads-up, forcing the models to constantly adapt without complete information.
OpenAI’s systems have stood out so far. Both finalists in the poker bracket are OpenAI models, continuing a trend that started with a smaller AI poker event in late 2025, where OpenAI also finished on top. This time, the sample size is much larger, with hundreds of thousands of hands scheduled, giving the others a clearer picture of long-term performance.
The matches are being streamed with expert commentary from Liv Boeree, Nick Schulman, and chess grandmaster Hikaru Nakamura. Schulman, known for breaking down human decision-making at the poker table, has drawn interest for trying to explain why AI models choose certain lines in complex hands.
Beyond entertainment, the games raise bigger questions. Poker and Werewolf involve bluffing and deception, which has sparked discussion about whether training AI in these environments carries risks. Boeree addressed this directly in a recent podcast, asking whether such testing could encourage manipulative behavior in future systems.
For now, the focus remains on performance. OpenAI’s deep run suggests its models are especially strong in situations that mix probability, strategy, and adaptation, reinforcing their growing reputation in competitive AI research.