2012 |
A Zlatintsi, P Maragos, A Potamianos, G Evangelopoulos A Saliency-Based Approach to Audio Event Detection and Summarization Conference Proc. European Signal Processing Conference, Bucharest, Romania, 2012. Abstract | BibTeX | Links: [PDF] @conference{ZMP+12, title = {A Saliency-Based Approach to Audio Event Detection and Summarization}, author = {A Zlatintsi and P Maragos and A Potamianos and G Evangelopoulos}, url = {http://robotics.ntua.gr/wp-content/publications/ZlatintsiMaragos+_SaliencyBasedAudioSummarization_EUSIPCO2012.pdf}, year = {2012}, date = {2012-08-01}, booktitle = {Proc. European Signal Processing Conference}, address = {Bucharest, Romania}, abstract = {In this paper, we approach the problem of audio summarization by saliency computation of audio streams, exploring the potential of a modulation model for the detection of perceptually important audio events based on saliency models, along with various fusion schemes for their combination. The fusion schemes include linear, adaptive and nonlinear methods. A machine learning approach, where training of the features is performed, was also applied for the purpose of comparison with the proposed technique. For the evaluation of the algorithm we use audio data taken from movies and we show that nonlinear fusion schemes perform best. The results are reported on the MovSum database, using objective evaluations (against ground-truth denoting the perceptually important audio events). Analysis of the selected audio segments is also performed against a labeled database in respect to audio categories, while a method for fine-tuning of the selected audio events is proposed.}, keywords = {}, pubstate = {published}, tppubtype = {conference} } In this paper, we approach the problem of audio summarization by saliency computation of audio streams, exploring the potential of a modulation model for the detection of perceptually important audio events based on saliency models, along with various fusion schemes for their combination. The fusion schemes include linear, adaptive and nonlinear methods. A machine learning approach, where training of the features is performed, was also applied for the purpose of comparison with the proposed technique. For the evaluation of the algorithm we use audio data taken from movies and we show that nonlinear fusion schemes perform best. The results are reported on the MovSum database, using objective evaluations (against ground-truth denoting the perceptually important audio events). Analysis of the selected audio segments is also performed against a labeled database in respect to audio categories, while a method for fine-tuning of the selected audio events is proposed. |
Copyright Notice:
Some material presented is available for download to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The work already published by the IEEE is under its copyright. Personal use of such material is permitted. However, permission to reprint/republish the material for advertising or promotional purposes, or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the work in other works must be obtained from the IEEE.