Dynamic heuristic acceleration of linearly approximated SARSA(λ)

using ant colony optimization to learn heuristics dynamically

Stefano Bromuri*

*Corresponding author for this work

Research output: Contribution to journalArticleAcademicpeer-review

Abstract

Heuristically accelerated reinforcement learning (HARL) is a new family of algorithms that combines the advantages of reinforcement learning (RL) with the advantages of heuristic algorithms. To achieve this, the action selection strategy of the standard RL algorithm is modified to take into account a heuristic running in parallel with the RL process. This paper presents two approximated HARL algorithms that make use of pheromone trails to improve the behaviour of linearly approximated SARSA(λ) by dynamically learning a heuristic function through the pheromone trails. The proposed dynamic algorithms are evaluated in comparison to linearly approximated SARSA(λ), and heuristically accelerated SARSA(λ) using a static heuristic in three benchmark scenarios: the mountain car, the mountain car 3D and the maze scenarios."
Original languageEnglish
Number of pages33
JournalJournal of Heuristics
DOIs
Publication statusE-pub ahead of print - 3 May 2019

Fingerprint

Ant colony optimization
Reinforcement learning
Reinforcement Learning
Linearly
Heuristics
Pheromone
Learning algorithms
Learning Algorithm
Railroad cars
Scenarios
Dynamic Algorithms
Heuristic algorithms
Learning Process
Heuristic algorithm
Benchmark
Learning algorithm
Car

Cite this

@article{125db04b56a541a4bb3a1ce4a883f77f,
title = "Dynamic heuristic acceleration of linearly approximated SARSA(λ): using ant colony optimization to learn heuristics dynamically",
abstract = "Heuristically accelerated reinforcement learning (HARL) is a new family of algorithms that combines the advantages of reinforcement learning (RL) with the advantages of heuristic algorithms. To achieve this, the action selection strategy of the standard RL algorithm is modified to take into account a heuristic running in parallel with the RL process. This paper presents two approximated HARL algorithms that make use of pheromone trails to improve the behaviour of linearly approximated SARSA(λ) by dynamically learning a heuristic function through the pheromone trails. The proposed dynamic algorithms are evaluated in comparison to linearly approximated SARSA(λ), and heuristically accelerated SARSA(λ) using a static heuristic in three benchmark scenarios: the mountain car, the mountain car 3D and the maze scenarios.{"}",
author = "Stefano Bromuri",
year = "2019",
month = "5",
day = "3",
doi = "10.1007/s10732-019-09408-x",
language = "English",
journal = "Journal of Heuristics",
issn = "1381-1231",
publisher = "Springer Netherlands",

}

TY - JOUR

T1 - Dynamic heuristic acceleration of linearly approximated SARSA(λ)

T2 - using ant colony optimization to learn heuristics dynamically

AU - Bromuri, Stefano

PY - 2019/5/3

Y1 - 2019/5/3

N2 - Heuristically accelerated reinforcement learning (HARL) is a new family of algorithms that combines the advantages of reinforcement learning (RL) with the advantages of heuristic algorithms. To achieve this, the action selection strategy of the standard RL algorithm is modified to take into account a heuristic running in parallel with the RL process. This paper presents two approximated HARL algorithms that make use of pheromone trails to improve the behaviour of linearly approximated SARSA(λ) by dynamically learning a heuristic function through the pheromone trails. The proposed dynamic algorithms are evaluated in comparison to linearly approximated SARSA(λ), and heuristically accelerated SARSA(λ) using a static heuristic in three benchmark scenarios: the mountain car, the mountain car 3D and the maze scenarios."

AB - Heuristically accelerated reinforcement learning (HARL) is a new family of algorithms that combines the advantages of reinforcement learning (RL) with the advantages of heuristic algorithms. To achieve this, the action selection strategy of the standard RL algorithm is modified to take into account a heuristic running in parallel with the RL process. This paper presents two approximated HARL algorithms that make use of pheromone trails to improve the behaviour of linearly approximated SARSA(λ) by dynamically learning a heuristic function through the pheromone trails. The proposed dynamic algorithms are evaluated in comparison to linearly approximated SARSA(λ), and heuristically accelerated SARSA(λ) using a static heuristic in three benchmark scenarios: the mountain car, the mountain car 3D and the maze scenarios."

U2 - 10.1007/s10732-019-09408-x

DO - 10.1007/s10732-019-09408-x

M3 - Article

JO - Journal of Heuristics

JF - Journal of Heuristics

SN - 1381-1231

ER -