Publications

2017

Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task Conference

Proc. IEEE Int'l Conference on Robotic Computing, Taichung, Taiwan, 2017.

Abstract | BibTeX | Links: [PDF]

Mehdi Khamassi, George Velentzas, Theodore Tsitsimis, Costas Tzafestas

Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task Conference

Proceedings - 2017 1st IEEE International Conference on Robotic Computing, IRC 2017, 2017, ISBN: 9781509067237.

Abstract | BibTeX | Links: [PDF]

@conference{337,
title = {Active exploration and parameterized reinforcement learning applied to a simulated human-robot interaction task},
author = { Mehdi Khamassi and George Velentzas and Theodore Tsitsimis and Costas Tzafestas},
url = {http://ieeexplore.ieee.org/document/7926511/%0Ahttp://ieeexplore.ieee.org/ielx7/7925476/7926477/07926511.pdf?tp=&arnumber=7926511&isnumber=7926477},
doi = {10.1109/IRC.2017.33},
isbn = {9781509067237},
year = {2017},
date = {2017-01-01},
booktitle = {Proceedings - 2017 1st IEEE International Conference on Robotic Computing, IRC 2017},
pages = {28--35},
abstract = {textcopyright 2017 IEEE. Online model-free reinforcement learning (RL) methods with continuous actions are playing a prominent role when dealing with real-world applications such as Robotics. However, when confronted to non-stationary environments, these methods crucially rely on an exploration-exploitation trade-off which is rarely dynamically and automatically adjusted to changes in the environment. Here we propose an active exploration algorithm for RL in structured (parameterized) continuous action space. This framework deals with a set of discrete actions, each of which is parameterized with continuous variables. Discrete exploration is controlled through a Boltzmann softmax function with an inverse temperature $beta$ parameter. In parallel, a Gaussian exploration is applied to the continuous action parameters. We apply a meta-learning algorithm based on the comparison between variations of short-Term and long-Term reward running averages to simultaneously tune $beta$ and the width of the Gaussian distribution from which continuous action parameters are drawn. We first show that this algorithm reaches state-of-The-Art performance in the non-stationary multi-Armed bandit paradigm, while also being generalizable to continuous actions and multi-step tasks. We then apply it to a simulated human-robot interaction task, and show that it outperforms continuous parameterized RL both without active exploration and with active exploration based on uncertainty variations measured by a Kalman-Q-learning algorithm.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}

Copyright Notice:

Some material presented is available for download to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The work already published by the IEEE is under its copyright. Personal use of such material is permitted. However, permission to reprint/republish the material for advertising or promotional purposes, or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the work in other works must be obtained from the IEEE.