2016 |
N Kardaris, V Pitsikalis, E Mavroudi, P Maragos Introducing Temporal Order of Dominant Visual Word Sub-Sequences for Human Action Recognition Conference Proc. of IEEE Int'l Conf. on Image Processing (ICIP-2016), Phoenix, AZ, USA, 2016. Abstract | BibTeX | Links: [PDF] @conference{acm_kardaris_2016b, title = {Introducing Temporal Order of Dominant Visual Word Sub-Sequences for Human Action Recognition}, author = {N Kardaris and V Pitsikalis and E Mavroudi and P Maragos}, url = {http://robotics.ntua.gr/wp-content/publications/KRPAM_BuildingMultimodalInterfaces_ACM-MM2016.pdf}, doi = {10.1109/ICIP.2016.7532922}, year = {2016}, date = {2016-09-01}, booktitle = {Proc. of IEEE Int'l Conf. on Image Processing (ICIP-2016)}, address = {Phoenix, AZ, USA}, abstract = {We present a novel video representation for human action recognition by considering temporal sequences of visual words. Based on state-of-the-art dense trajectories, we introduce temporal bundles of dominant, that is most frequent, visual words. These are employed to construct a complementary action representation of ordered dominant visual word sequences, that additionally incorporates fine-grained temporal information. We exploit the introduced temporal information by applying local sub-sequence alignment that quantifies the similarity between sequences. This facilitates the fusion of our representation with the bag-of-visual-words (BoVW) representation. Our approach incorporates sequential temporal structure and results in a low-dimensional representation compared to the BoVW, while still yielding a descent result when combined with it. Experiments on the KTH, Hollywood2 and the challenging HMDB51 datasets show that the proposed framework is complementary to the BoVW representation, which discards temporal order}, keywords = {}, pubstate = {published}, tppubtype = {conference} } We present a novel video representation for human action recognition by considering temporal sequences of visual words. Based on state-of-the-art dense trajectories, we introduce temporal bundles of dominant, that is most frequent, visual words. These are employed to construct a complementary action representation of ordered dominant visual word sequences, that additionally incorporates fine-grained temporal information. We exploit the introduced temporal information by applying local sub-sequence alignment that quantifies the similarity between sequences. This facilitates the fusion of our representation with the bag-of-visual-words (BoVW) representation. Our approach incorporates sequential temporal structure and results in a low-dimensional representation compared to the BoVW, while still yielding a descent result when combined with it. Experiments on the KTH, Hollywood2 and the challenging HMDB51 datasets show that the proposed framework is complementary to the BoVW representation, which discards temporal order |
Copyright Notice:
Some material presented is available for download to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author’s copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.
The work already published by the IEEE is under its copyright. Personal use of such material is permitted. However, permission to reprint/republish the material for advertising or promotional purposes, or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of the work in other works must be obtained from the IEEE.