Interspeech 2018 tutorial

Multimodal Speech and Audio Procesing
in Audio-Visual Human-Robot Interaction
Tutorial Title: Multimodal Speech and Audio Procesing in Audio-Visual Human-Robot Interaction

Abstract: The goal of this tutorial is to provide a concise overview of ideas, methods and research results in multimodal speech and audio processing, spatio-temporal sensory processing, perception and fusion, with applications in Human-Robot Interaction. Nowadays, most data are multimodal, thus there is the emergent need of developing multimodal methodologies, taking also into account the visual modality so as to enhance and assist the audio/speech modality. This tutorial will present state-of-the-art work for the major application area, which is Human-Robot Interaction, for social, edutainment and healthcare applications, including among others audio-gestural recognition for natural communication with the robotic agent and audio-visual speech synthesis for assistance and maximization of the naturalness of the interaction. Established results and recent advances from our research in various EU projects concerning the above areas as well as for the purposes of distant-speech interaction for robust home applications will also be discussed. Additionally, it will present a secondary application area that also relies on audio-visual processing, including in this case methodologies for saliency detection and automatic summarization of mono-modal or multimodal data (i.e., audio or video) and for the development of virtual interactive environments, where human body motion or hand gestures are used for audio-gestural music synthesis.

Related papers and current results can be found in and

Date/Time: September 2, 2018; 2 PM to 5.30 PM


Petros Maragos
Athanasia Zlatintsi
Primary Contact: Petros Maragos
IRAL-CVSP, National Technical Univ. of Athens,
Zografou campus, Athens 15773
Phone: +30 210772-2360, Fax: +30 210772-3397

Tutorial Slides

Part IMultimodal Signal Processing, A-V Perception and Fusion
Part IIAudio-Visual HRI: Methodology and Applications in Assistive Robotics
Part IIIAudio-Visual Child-Robot Interaction
Part IVMultimodal Saliency and Video Summarization
Part VAudio-Gestural Music Synthesis
List of References
Satellite Event:
Summer School on Speech Signal Processing (S4P) 2018
Institute of Information and Communication Technology (DA-IICT)
Gandhinagar, India, 9-11 Sep. 2018 


Petros Maragos

S4P Lecture Slides

Lecture INonlinear Aspects of Speech Production: Modulations and Energy Operators
Lecture IINonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics