Project Details
Description
In recent years, action recognition has attracted more and more attention, because home care systems with action recognition capabilities can alleviate the pressure on families in an aging society, and smart surveillance can detect abnormal actions from monitor images. Many other business applications, such as human-computer interaction, autonomous driving, sport analysis, or online video search and retrieval, also urgently require the assistance of action recognition systems. The current action recognition technology mainly uses supervised learning. The mainstream model architectures include 3D-CNN and two-stream CNN. These deep models use a large number of manually labeled videos to learn and have been proven to perform well in action recognition. However, a major problem with supervised learning is that it relies too much on large-scale data sets, and the collection and labeling of movies requires a lot of labor costs. Such a complicated training process is difficult to scale up, and the learned visual representations have great limitations. The problem is whether deep learning models can learn from a large number of unlabeled videos, so self-supervised learning is proposed to solve the above problem. Self-supervised learning excavates the interrelationships between training data, automatically generates semantic annotations, and learns highly recognizable visual features through assisted tasks. This training method can be easily extended to other rich and diverse datasets because it does not rely on manual annotations. In addition, the learned visual representations can be transferred to complex visual tasks. In this project, we have proposed an unsupervised visual representation learning framework for action recognition based on self-supervised learning model. It does not require a large amount of manually labeled training data, and can learn action features directly from the film. In this framework, Contrastive Learning and Instance Learning are combined to modify the framework based on three-dimensional convolutional neural networks. This framework implements end-to-end training. The features found through this learning method will become very easy to scale-up. And it is easy to transfer to various tasks through Transfer-learning.
Status | Finished |
---|---|
Effective start/end date | 2020/08/01 → 2021/07/31 |
Keywords
- action recognition
- smart surveillance
- human-computer interaction
- autonomous driving
- sport analysis
- deep learning
- self-supervised learning
- convolutional neural network
- Contrastive Learning
- Instance Learning
- Transfer-learning
Fingerprint
Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.