Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning

Giovanni Varricchione*, Natasha Alechina, Mehdi Dastani, Brian Logan

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Article in proceedingAcademicpeer-review

Abstract

Reward machines have recently been proposed as a means of encoding team tasks in cooperative multi-agent reinforcement learning. The resulting multi-agent reward machine is then decomposed into individual reward machines, one for each member of the team, allowing agents to learn in a decentralised manner while still achieving the team task. However, current work assumes the multi-agent reward machine to be given. In this paper, we show how reward machines for team tasks can be synthesised automatically from an Alternating-Time Temporal Logic specification of the desired team behaviour and a high-level abstraction of the agents’ environment. We present results suggesting that our automated approach has comparable, if not better, sample efficiency than reward machines generated by hand for multi-agent tasks.
Original languageEnglish
Title of host publicationMulti-Agent Systems - 20th European Conference, EUMAS 2023, Proceedings
EditorsVadim Malvone, Aniello Murano
Pages328–344
ISBN (Electronic)9783031432644
DOIs
Publication statusPublished - 7 Sept 2023

Publication series

SeriesLecture Notes in Computer Science
Volume14282
ISSN0302-9743

Fingerprint

Dive into the research topics of 'Synthesising Reward Machines for Cooperative Multi-Agent Reinforcement Learning'. Together they form a unique fingerprint.

Cite this