Pure-Past Action Masking

Giovanni Varricchione*, N.A. Alechina, Mehdi Dastani, Giuseppe De Giacomo, Brian Logan, Giuseppe Perelli

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference Article in proceedingAcademicpeer-review

Abstract

We present Pure-Past Action Masking (PPAM), a lightweight approach to action masking for safe reinforcement learning. In PPAM, actions are disallowed ("masked'') according to specifications expressed in Pure-Past Linear Temporal Logic (PPLTL). PPAM can enforce non-Markovian constraints, i.e., constraints based on the history of the system, rather than just the current state of the (possibly hidden) MDP. The features used in the safety constraint need not be the same as those used by the learning agent, allowing a clear separation of concerns between the safety constraints and reward specifications of the (learning) agent. We prove formally that an agent trained with PPAM can learn any optimal policy that satisfies the safety constraints, and that they are as expressive as shields, another approach to enforce non-Markovian constraints in RL. Finally, we provide empirical results showing how PPAM can guarantee constraint satisfaction in practice.
Original languageEnglish
Title of host publicationProceedings of the 38th AAAI International Conference on Artificial Intelligence
EditorsMichael Wooldridge, Jennifer Dy, Sriraam Natarajan
PublisherAssociation for the Advancement of Artificial Intelligence
Pages21646-21655
Number of pages10
Volume38
Edition19
ISBN (Print)1-57735-887-2, 978-1-57735-887-9
DOIs
Publication statusPublished - 25 Mar 2024
EventThe 38th AAAI Conference on Artificial Intelligence - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024
https://aaai.org/aaai-conference/

Conference

ConferenceThe 38th AAAI Conference on Artificial Intelligence
Abbreviated titleAAAI-24
Country/TerritoryCanada
CityVancouver
Period20/02/2427/02/24
Internet address

Fingerprint

Dive into the research topics of 'Pure-Past Action Masking'. Together they form a unique fingerprint.

Cite this