Publications

2017

G. Velentzas, C. Tzafestas, M. Khamassi

Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks Conference

Proc. IEEE Intelligent Systems Conference, London, UK, 2017.

@conference{BFB97,
title = {Bio-inspired meta-learning for active exploration during non-stationary multi-armed bandit tasks},
author = {G. Velentzas and C. Tzafestas and M. Khamassi},
url = {http://robotics.ntua.gr/wp-content/publications/Velentzas_Intellisys2017.pdf},
doi = {10.1109/IntelliSys.2017.8324365},
year = {2017},
date = {2017-09-01},
booktitle = {Proc. IEEE Intelligent Systems Conference},
address = {London, UK},
abstract = {Fast adaptation to changes in the environment requires agents (animals, robots and simulated artefacts) to be able to dynamically tune an exploration-exploitation trade-off during learning. This trade-off usually determines a fixed proportion of exploitative choices (i.e. choice of the action that subjectively appears as best at a given moment) relative to exploratory choices (i.e. testing other actions that now appear worst but may turn out promising later). Rather than using a fixed proportion, non-stationary multi-armed bandit methods in the field of machine learning have proven that principles such as exploring actions that have not been tested for a long time can lead to performance closer to optimal - bounded regret. In parallel, researches in active exploration in the fields of robot learning and computational neuroscience of learning and decision-making have proposed alternative solutions such as transiently increasing exploration in response to drops in average performance, or attributing exploration bonuses specifically to actions associated with high uncertainty in order to gain information when choosing them. In this work, we compare different methods from machine learning, computational neuroscience and robot learning on a set of non-stationary stochastic multi-armed bandit tasks: abrupt shifts; best bandit becomes worst one and vice versa; multiple shifting frequencies. We find that different methods are appropriate in different scenarios. We propose a new hybrid method combining bio-inspired meta-learning, kalman filter and exploration bonuses and show that it outperforms other methods in these scenarios.},
keywords = {},
pubstate = {published},
tppubtype = {conference}
}

Copyright Notice:

Some material presented is available for download to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The work already published by the IEEE is under its copyright. Personal use of such material is permitted. However, permission to reprint/republish the material for advertising or promotional purposes, or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the work in other works must be obtained from the IEEE.