Multiple scale-specific representations for improved human action recognition

TitleMultiple scale-specific representations for improved human action recognition
Publication TypeJournal Article
Year of Publication2014
AuthorsShabani, A. H., J. S. Zelek, and D. A. Clausi
JournalPattern Recognition Letters
Keywordsconcatenated representation, decoupled representation, Human action recognition, scale-specific representation, separability test, spatio-temporal salient features

Human action recognition in video is important in many computer vision applications such as automated surveillance. Human
actions can be compactly encoded using a sparse set of local spatio-temporal salient features at different scales. The existing
bottom-up methods construct a single dictionary of action primitives from the joint features of all scales and hence, a single action representation. This representation cannot fully exploit the complementary characteristics of the motions across different scales. To address this problem, we introduce the concept of learning multiple dictionaries of action primitives at different resolutions and consequently, multiple scale-specific representations for a given video sample. Using a decoupled fusion of multiple representations, we improved the human classification accuracy of realistic benchmark databases by about 5%, compared with the state-of-the art methods.