Maximally Permissive Reward Machines

Giovanni Varricchione*, Natasha Alechina, Mehdi Dastani, Brian Logan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Article in proceedingAcademicpeer-review

Abstract

Reward machines allow the definition of rewards for temporally extended tasks and behaviors. Specifying “informative” reward machines can be challenging. One way to address this is to generate reward machines from a high-level abstract description of the learning environment, using techniques such as AI planning. However, previous planning-based approaches generate a reward machine based on a single (sequential or partial-order) plan, and do not allow maximum flexibility to the learning agent. In this paper we propose a new approach to synthesising reward machines which is based on the set of partial order plans for a goal. We prove that learning using such “maximally permissive” reward machines results in higher rewards than learning using RMs based on a single plan. We present experimental results which support our theoretical claims by showing that our approach obtains higher rewards than the single-plan approach in practice.
Original languageEnglish
Title of host publicationECAI 2024 - 27th European Conference on Artificial Intelligence, Including 13th Conference on Prestigious Applications of Intelligent Systems, PAIS 2024, Proceedings
Subtitle of host publication27th European Conference on Artificial Intelligence 19–24 October 2024, Santiago de Compostela, Spain
EditorsUlle Endriss, Francisco S. Melo, Kerstin Bach, Alberto Bugarín-Diz, José M. Alonso-Moral, Senén Barro, Fredrik Heintz
PublisherIOS Press BV
Pages1181-1188
Number of pages8
Volume392
ISBN (Electronic)9781643685489
DOIs
Publication statusPublished - 16 Oct 2024
Event27th European Conference on Artificial Intelligence - Santiago de Compostela, Spain
Duration: 19 Oct 202424 Oct 2024
Conference number: 27

Publication series

SeriesFrontiers in Artificial Intelligence and Applications
Volume392
ISSN0922-6389

Conference

Conference27th European Conference on Artificial Intelligence
Abbreviated titleACAI
Country/TerritorySpain
CitySantiago de Compostela
Period19/10/2424/10/24

Fingerprint

Dive into the research topics of 'Maximally Permissive Reward Machines'. Together they form a unique fingerprint.

Cite this