Mastering Cooperative, Incomplete Information Board Games by Self-Play

  • W. van der Weij

Student thesis: Master's Thesis


The goal of this research is to determine how Reinforcement Learning (RL) can be best used for board-card games. Board-card games are marble racing games with imperfect information and team play. These game characteristics bring more complexity after the successes of RL research on perfect information games like Go and Chess.
This research is based on, and part of, broader developments in RL. The study of re-lated work led to the choice to apply the techniques of Deep Q-Networks (DQN) and Deep Monte Carlo (DMC) to a board-card games case study. The agents learn from self-play only. RLCard was chosen as the RL research framework.
The results with DMC are much better than with DQN. DQN scores a maximum win rate of 68% against randomly playing agents, and manual analysis shows that the DQN agents learns the game poorly. The variances caused by the characteristics of board-card games (large action space, multi-agent, imperfect information) are the likely cause of the problems with the incremental nature, -greediness and Artificial Neural Network (NN) of DQN. DMC scores a win rate of 99% against randomly playing agents and 49% against rule-based agents after 49 days of training. Manual analysis shows that DMC learns gradually. Experiments with the NN size, reward function and the observation model demonstrate that even better results are likely with more training.
DMC seems to be the best technique to learn board-card games. The playing strength is good enough and other properties of RL techniques are interesting enough to be applied to board-card games in practice. Future work on DMC would be worthwhile to determine how strong the agent can become.
Date of Award2 Jun 2022
Original languageEnglish
SupervisorMartijn van Otterlo (Examiner), Twan van Laarhoven (Co-assessor) & Frank Tempelman (Co-assessor)

Master's Degree

  • Master Software Engineering

Cite this