in Audio-Visual Human-Robot Interaction |
Tutorial Title: Multimodal Speech and Audio Procesing in Audio-Visual Human-Robot Interaction
Abstract: The goal of this tutorial is to provide a concise overview of ideas, methods and research results in multimodal speech and audio processing, spatio-temporal sensory processing, perception and fusion, with applications in Human-Robot Interaction. Nowadays, most data are multimodal, thus there is the emergent need of developing multimodal methodologies, taking also into account the visual modality so as to enhance and assist the audio/speech modality. This tutorial will present state-of-the-art work for the major application area, which is Human-Robot Interaction, for social, edutainment and healthcare applications, including among others audio-gestural recognition for natural communication with the robotic agent and audio-visual speech synthesis for assistance and maximization of the naturalness of the interaction. Established results and recent advances from our research in various EU projects concerning the above areas as well as for the purposes of distant-speech interaction for robust home applications will also be discussed. Additionally, it will present a secondary application area that also relies on audio-visual processing, including in this case methodologies for saliency detection and automatic summarization of mono-modal or multimodal data (i.e., audio or video) and for the development of virtual interactive environments, where human body motion or hand gestures are used for audio-gestural music synthesis. Related papers and current results can be found in http://cvsp.cs.ntua.gr and http://robotics.ntua.gr. Date/Time: September 2, 2018; 2 PM to 5.30 PM |
Presenters |
Petros Maragos Athanasia Zlatintsi Primary Contact: Petros Maragos IRAL-CVSP, National Technical Univ. of Athens, Zografou campus, Athens 15773 maragos@cs.ntua.gr Phone: +30 210772-2360, Fax: +30 210772-3397 |
Tutorial Slides |
Part I: Multimodal Signal Processing, A-V Perception and Fusion Part II: Audio-Visual HRI: Methodology and Applications in Assistive Robotics Part III: Audio-Visual Child-Robot Interaction Part IV: Multimodal Saliency and Video Summarization Part V: Audio-Gestural Music Synthesis List of References |
Summer School on Speech Signal Processing (S4P) 2018 |
Gandhinagar, India, 9-11 Sep. 2018 |
Presenter |
Petros Maragos |
S4P Lecture Slides |
Lecture I: Nonlinear Aspects of Speech Production: Modulations and Energy Operators Lecture II: Nonlinear Aspects of Speech Production: Fractals and Chaotic Dynamics |