Nancy Zlatintsi | IRAL

Nancy Zlatintsi filby 2024-07-04T15:46:29+00:00

Nancy Zlatintsi

Nancy ZlatintsiPostdoctoral Research Associate

(+30) 210-772-2964
nzlat@cs.ntua.gr
Office 2.2.19

Courses

Recent Research Projects
Publications

Biosketch

I was born in Thessaloniki, Greece and received my Diploma degree in Media Engineering from the Royal Institute of Technology (KTH), Stockholm, Sweden in September 2006. My Master’s degree thesis, which was conducted at the Department of Speech, Music and Hearing (TMH – Fant Laboratorium), under the supervision of Kjetil Falkenberg Hansen and Prof. Anders Askenfelt, was in music acoustics and specifically the sound of clarinet. This thesis was an unofficial part of the European project VEMUS.

Since January 2007 I was a Ph.D. candidate in CVSP group – at the Electrical and Computer Engineering School in the National Technical University of Athens, Greece – under the supervision of Prof. Petros Maragos in the general areas of Audio and Multimedia Processing. During this period, I partially participated in the EU MUSCLE project, with human movie annotation and human evaluations. My research interests lie in the areas of music information retrieval and audio processing and include analysis and recognition.

In December 2013 I received my PhD degree with title “Music Signal Processing and Applications in Recognition”. I currently work as a Postdoctoral Research Associate in CVSP group in related topics and I paricipate in various European and Greek research programs.

I have also studied musicology for four semesters at Stockholm University in Sweden. It is not hard to understand that when not studying my biggest obsession is music – listening, reading, playing and talking about it. Some of my other interests are photography and photo editing, books, movies and travelling.

For my doctoral studies in the area of MIR since February 2011, my research has been co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program “Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) – Research Funding Program: Heracleitus II. Investing in knowledge society through the European Social Fund.

Recent Research Projects

e-Prevention

e-Prevention

iMuSciCA

iMuSciCA

Publications

2018

G Bouritsas, P Koutras, A Zlatintsi, Petros Maragos

Multimodal Visual Concept Learning with Weakly Supervised Techniques Conference

Proc. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA, 2018.

Abstract | BibTeX | Links: [PDF]

A Zlatintsi, I Rodomagoulakis, P Koutras, A ~C Dometios, V Pitsikalis, C ~S Tzafestas, P Maragos

Multimodal Signal Processing and Learning Aspects of Human-Robot Interaction for an Assistive Bathing Robot Conference

Proc. IEEE Int'l Conf. Acous., Speech, and Signal Processing, Calgary, Canada, 2018.

Abstract | BibTeX | Links: [PDF]

2017

A Zlatintsi, P Koutras, G Evangelopoulos, N Malandrakis, N Efthymiou, K Pastra, A Potamianos, P Maragos

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization Journal Article

EURASIP Journal on Image and Video Processing, 54 , pp. 1–24, 2017.

Abstract | BibTeX | Links: [PDF]

@article{ZKE+17,
title = {COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization},
author = {A Zlatintsi and P Koutras and G Evangelopoulos and N Malandrakis and N Efthymiou and K Pastra and A Potamianos and P Maragos},
url = {http://robotics.ntua.gr/wp-content/publications/Zlatintsi+_COGNIMUSEdb_EURASIP_JIVP-2017.pdf},
doi = {doi 10.1186/s13640-017-0194},
year = {2017},
date = {2017-01-01},
journal = {EURASIP Journal on Image and Video Processing},
volume = {54},
pages = {1--24},
abstract = {Research related to computational modeling for machine-based understanding requires ground truth data for training, content analysis, and evaluation. In this paper, we present a multimodal video database, namely COGNIMUSE, annotated with sensory and semantic saliency, events, cross-media semantics, and emotion. The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media events, as well as for emotion tracking. In order to enable comparisons with other computational models, we propose state-of-the-art algorithms, specifically a unified energy-based audio-visual framework and a method for text saliency computation, for the detection of perceptually salient events from videos. Additionally, a movie summarization system for the automatic production of summaries is presented. Two kinds of evaluation were performed, an objective based on the saliency annotation of the database and an extensive qualitative human evaluation of the automatically produced summaries, where we investigated what composes high-quality movie summaries, where both methods verified the appropriateness of the proposed methods. The annotation of the database and the code for the summarization system can be found at http://cognimuse.cs.ntua.gr/database.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

A Zlatintsi, I Rodomagoulakis, V Pitsikalis, P Koutras, N Kardaris, X Papageorgiou, C Tzafestas, P Maragos

Social Human-Robot Interaction for the Elderly: Two Real-life Use Cases, Conference

ACM/IEEE International Conference on Human-Robot Interaction (HRI), Vienna, Austria, 2017.

Abstract | BibTeX | Links: [PDF]

G Karamanolakis, E Iosif, A Zlatintsi, A Pikrakis, A Potamianos

Audio-based Distributional Semantic Models for Music Auto-tagging and Similarity Measurement Conference

Proc. MultiLearn2017: Multimodal Processing, Modeling and Learning for Human-Computer/Robot Interaction Workshop, in conjuction with European Signal Processing Conference, Kos, Greece, 2017.

Abstract | BibTeX | Links: [PDF]

2016

G Panagiotaropoulou, P Koutras, A Katsamanis, P Maragos, A Zlatintsi, A Protopapas, E Karavasilis, N Smyrnis

fMRI-based Perceptual Validation of a computational Model for Visual and Auditory Saliency in Videos Conference

Proc. {IEEE} Int'l Conf. Acous., Speech, and Signal Processing, Phoenix, AZ, USA, 2016.

Abstract | BibTeX | Links: [PDF]

G Karamanolakis, E Iosif, A Zlatintsi, A Pikrakis, A Potamianos

Audio-Based Distributional Representations of Meaning Using a Fusion of Feature Encodings Conference

2016.

Abstract | BibTeX | Links: [Webpage] [PDF]

Georgia Panagiotaropoulou, Petros Koutras, Athanasios Katsamanis, Petros Maragos, Athanasia Zlatintsi, Athanassios Protopapas, Efstratios Karavasilis, Nikolaos Smyrnis

FMRI-based perceptual validation of a computational model for visual and auditory saliency in videos Conference

Proceedings - International Conference on Image Processing, ICIP, 2016-August , 2016, ISSN: 15224880.

Abstract | BibTeX | Links: [PDF]

2015

A Zlatintsi, E.Iosif, P Maragos, A Potamianos

Audio Salient Event Detection and Summarization using Audio and Text Modalities Conference

Nice, France, 2015.

Abstract | BibTeX | Links: [PDF]

P Koutras, A Zlatintsi, E.Iosif, A Katsamanis, P Maragos, A Potamianos

Predicting Audio-Visual Salient Events Based on Visual, Audio and Text Modalities for Movie Summarization Conference

Proc. {IEEE} Int'l Conf. Acous., Speech, and Signal Processing, Quebec, Canada, 2015.

Abstract | BibTeX | Links: [PDF]

A Zlatintsi, P Koutras, N Efthymiou, P Maragos, A Potamianos, K Pastra

Quality Evaluation of Computational Models for Movie Summarization Conference

Costa Navarino, Messinia, Greece, 2015.

Abstract | BibTeX | Links: [PDF]

P. Koutras, A. Zlatintsi, E. Iosif, A. Katsamanis, P. Maragos, A. Potamianos

Predicting audio-visual salient events based on visual, audio and text modalities for movie summarization Conference

Proceedings - International Conference on Image Processing, ICIP, 2015-December , 2015, ISSN: 15224880.

BibTeX | Links: [PDF]

2014

A Zlatintsi, P Maragos

Comparison of Different Representations Based on Nonlinear Features for Music Genre Classification Conference

Proc. European Signal Processing Conference, Lisbon, Portugal, 2014.

Abstract | BibTeX | Links: [PDF]

2013

Georgios Evangelopoulos, Athanasia Zlatintsi, Alexandros Potamianos, Petros Maragos, Konstantinos Rapantzikos, Georgios Skoumas, Yannis Avrithis

Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention Journal Article

IEEE Transactions on Multimedia, 15 (7), pp. 1553–1568, 2013, ISSN: 15209210.

Abstract | BibTeX | Links: [PDF]

@article{141,
title = {Multimodal saliency and fusion for movie summarization based on aural, visual, and textual attention},
author = {Georgios Evangelopoulos and Athanasia Zlatintsi and Alexandros Potamianos and Petros Maragos and Konstantinos Rapantzikos and Georgios Skoumas and Yannis Avrithis},
url = {http://robotics.ntua.gr/wp-content/uploads/publications/EZPMRSA_MultimodalSaliencyFusionMovieSumAVTattention_ieeetMM13.pdf},
doi = {10.1109/TMM.2013.2267205},
issn = {15209210},
year = {2013},
date = {2013-01-01},
journal = {IEEE Transactions on Multimedia},
volume = {15},
number = {7},
pages = {1553--1568},
abstract = {Multimodal streams of sensory information are naturally parsed and integrated by humans using signal-level feature extraction and higher level cognitive processes. Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream. Aural or auditory saliency is assessed by cues that quantify multifrequency waveform modulations, extracted through nonlinear operators and energy tracking. Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation. Textual or linguistic saliency is extracted from part-of-speech tagging on the subtitles information available with most movie distributions. The individual saliency streams, obtained from modality-depended cues, are integrated in a multimodal saliency curve, modeling the time-varying perceptual importance of the composite video stream and signifying prevailing sensory events. The multimodal saliency representation forms the basis of a generic, bottom-up video summarization algorithm. Different fusion schemes are evaluated on a movie database of multimodal saliency annotations with comparative results provided across modalities. The produced summaries, based on low-level features and content-independent fusion and selection, are of subjectively high aesthetic and informative quality.},
keywords = {},
pubstate = {published},
tppubtype = {article}
}

A Zlatintsi

Music Signal Processing and Applications in Recognition PhD Thesis

School of ECE, NTUA, 2013.

Abstract | BibTeX | Links: [PDF]

@phdthesis{Zlatintsi13,
title = {Music Signal Processing and Applications in Recognition},
author = {A Zlatintsi},
url = {http://robotics.ntua.gr/wp-content/publications/Zlatintsi_PhDThesis_Dec2013_EMP.pdf},
year = {2013},
date = {2013-12-01},
school = {School of ECE, NTUA},
abstract = {This thesis lays in the area of signal processing and analysis of music signalsusing
computational methods for the extraction of effective representations for automatic recognition. We explore and develop efficient algorithms using nonlinear methods for the analysis of the structure of music signals, which is of importance for their modeling. Our main research directions deals with the analysis of the structure and the characteristics of musical instruments in order to gain insight about their function and properties. We study the characteristics of the different genres of music.Finally, we evaluate the effectiveness of the proposed nonlinear models for the detection of perceptually important music and audio events. The approach we follow contributes to state-of-the-art technologies related to automatic computer-based recognition of musical signals and audio summarization, which nowadays are essential in everyday life. Because of the vast amount of music, audio and multimedia data in the web and our personal computers, the use of this
study could be shown in applications such as automatic genre classification, automatic recognition of music’s basic structures, such as musical instruments, and audio content analysis for music and audio summarization. The above mentioned applications require robust solutions
to information processing problems. Toward this goal, the development of efficient digital signal processing methods and the extraction of relevant features is of importance. In this thesis we propose such methods and algorithms for feature extraction with interesting
results that render the descriptors of direct applicability. The proposed methods are applied on classification experiments illustrating that they can capture important aspects of music, such as the micro-variations of their structure. Descriptors based on macro-structures may reduce the complexity of the classification system, since satisfactory results can be achieved using simpler statistical models. Finally, the
introduction of a ‘‘music’’ filterbank appears to be promising for automatic genre classification.},
keywords = {},
pubstate = {published},
tppubtype = {phdthesis}
}

This thesis lays in the area of signal processing and analysis of music signalsusing
computational methods for the extraction of effective representations for automatic recognition. We explore and develop efficient algorithms using nonlinear methods for the analysis of the structure of music signals, which is of importance for their modeling. Our main research directions deals with the analysis of the structure and the characteristics of musical instruments in order to gain insight about their function and properties. We study the characteristics of the different genres of music.Finally, we evaluate the effectiveness of the proposed nonlinear models for the detection of perceptually important music and audio events. The approach we follow contributes to state-of-the-art technologies related to automatic computer-based recognition of musical signals and audio summarization, which nowadays are essential in everyday life. Because of the vast amount of music, audio and multimedia data in the web and our personal computers, the use of this
study could be shown in applications such as automatic genre classification, automatic recognition of music’s basic structures, such as musical instruments, and audio content analysis for music and audio summarization. The above mentioned applications require robust solutions
to information processing problems. Toward this goal, the development of efficient digital signal processing methods and the extraction of relevant features is of importance. In this thesis we propose such methods and algorithms for feature extraction with interesting
results that render the descriptors of direct applicability. The proposed methods are applied on classification experiments illustrating that they can capture important aspects of music, such as the micro-variations of their structure. Descriptors based on macro-structures may reduce the complexity of the classification system, since satisfactory results can be achieved using simpler statistical models. Finally, the
introduction of a ‘‘music’’ filterbank appears to be promising for automatic genre classification.

2012

A Zlatintsi, P Maragos, A Potamianos, G Evangelopoulos

A Saliency-Based Approach to Audio Event Detection and Summarization Conference

Proc. European Signal Processing Conference, Bucharest, Romania, 2012.

Abstract | BibTeX | Links: [PDF]

A Zlatintsi, P Maragos

AM-FM Modulation Features for Music Instrument Signal Analysis and Recognition Conference

Proc. European Signal Processing Conference, Bucharest, Romania, 2012.

Abstract | BibTeX | Links: [PDF]

2011

A Zlatintsi, P Maragos

Musical Instruments Signal Analysis and Recognition Using Fractal Features Conference

Proc. European Signal Processing Conference, Barcelona, Spain, 2011.

Abstract | BibTeX | Links: [PDF]

N Malandrakis, A Potamianos, G Evangelopoulos, A Zlatintsi

A Supervised Approach to Movie Emotion Tracking Conference

Proc. {IEEE} Int'l Conf. Acous., Speech, and Signal Processing, Prague, Czech Republic, 2011.

Abstract | BibTeX | Links: [PDF]

Athanasia Zlatintsi, Petros Maragos

MUSICAL INSTRUMENTS SIGNAL ANALYSIS AND RECOGNITION USING FRACTAL FEATURES Conference

Proc. 19th European Signal Processing Conference (EUSIPCO-2011), (Eusipco), 2011.

BibTeX | Links: [PDF]

2009

G Evangelopoulos, A Zlatintsi, G Skoumas, K Rapantzikos, A Potamianos, P Maragos, Y Avrithis

Video Event Detection and Summarization Using Audio, Visual and Text Saliency Conference

Taipei, Taiwan, 2009.

Abstract | BibTeX | Links: [PDF]

G Evangelopoulos, A Zlatintsi, G Skoumas, K Rapantzikos, A Potamianos, P Maragos, Y Avrithis

Video Event Detection and Summarization using Audio, Visual and Text Saliency Conference

Icassp, (2), 2009, ISBN: 9781424423545.

BibTeX | Links: [PDF]

2008

G Evangelopoulos, K Rapantzikos, A Potamianos, P Maragos, A Zlatintsi, Y Avrithis

Movie Summarization based on Audiovisual Saliency Detection Conference

Proc. {IEEE} Int'l Conf. Acous., Speech, and Signal Processing, San Diego, CA, U.S.A., 2008.

Abstract | BibTeX | Links: [PDF]

D Spachos, A Zlatintsi, V Moschou, P Antonopoulos, E Benetos, M Kotti, K Tzimouli, C Kotropoulos, N Nikolaidis, P Maragos, I Pitas

MUSCLE Movie Database: A Multimodal Corpus With Rich Annotation For Dialogue And Saliency Detection Conference

Marrakech, Morocco, 2008.

Abstract | BibTeX | Links: [PDF]

G. Evangelopoulos, K. Rapantzikos, A. Potamianos, P. Maragos, A. Zlatintsi, Y. Avrithis

Movie summarization based on audiovisual saliency detection Conference

Proceedings - International Conference on Image Processing, ICIP, 2008, ISSN: 15224880.

Abstract | BibTeX | Links: [PDF]