Q-learning For Action Selection

  • S. Ordelman

Student thesis: Master's Thesis


By testing, bugs can be detected before they are shipped to the end user. Most systems are tested manually or through scripted testing. Testing can also be automated so that tests occur without intervention of a tester. This study is about an automated testing tool called TESTAR. TESTAR tests systems through its GUI, to do this it needs to select actions. Here is where one problem resides with automated testing tools. Which actions does it need to select to reach a high test-effectiveness? In the basis TESTAR selects actions at random, but there has been prior research in using Q-learning for action selection. Q-learning is mostly a model-free approach to machine learning where rewards are given based on the selected action. Based on these rewards the algorithm can make thoughtful decisions. This study builds upon prior research for action selection in TESTAR by looking further into Q-learning algorithms. Previously TESTAR only had a basic Q-learning algorithm.
The main research question of the study is "How can the test-effectiveness of TESTAR be increased using Q-learning in the action selection step?"
which is answered using three sub-questions,
• RQ1 - How can different Q-learning algorithms in TESTAR be implemented?
• RQ2 - What reward functions can be used for the implemented Q-learning algorithms in TESTAR based on the available reward metrics?
• RQ3 - How can the performance (test-effectiveness) of a combination of Q-learning algorithm and reward function in web-based applications be measured?
In the first research question two new Q-learning algorithms are implemented, Double Q-learning and QV-learning. The algorithms were tested on the java application Rachota and showed promising result compared to Basic Q-learning. Which takes us to the second Q-learning where new rewards are introduced. In total there are three existing rewards and this study introduces three new ones. All of those rewards in combination with the three Q-learning algorithms are tested against the Rachota application. In this research question Rachota was also tested using pure random action selection and after comparing the re-sults, promising results were found. The new QV-learning algorithm in combination with a new reward showed a 10% improvement over pure random actions selection. In the third research question these results were confirmed against the web-based application Money-bird which showed that the Q-learning algorithms were not only applicable to java appli-cations but also on web-based applications. Here is also introduced how test-effectiveness can be tested on web-based applications using a coverage API.
This study confirms the main research question and shows improvements in the action selection step using Q-learning. Possible improvement for future work are to give TESTAR better understanding of forms to prevent a blocking state and to test with other Q-learning algorithms and Q-learning configurations which were out of scope for this study.
Date of Award2 Jun 2022
Original languageEnglish
SupervisorTanja Vos (Examiner), Pekka Aho (Co-assessor) & Olivia Rodríguez Valdés (Co-assessor)

Master's Degree

  • Master Software Engineering

Cite this