A vision-based human action recognition system for moving cameras through deep learning

Ming Jen Chang, Jih Tang Hsieh*, Chiung Yao Fang, Sei Wang Chen

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

10 Citations (Scopus)

Abstract

This study presents a vision-based human action recognition system using a deep learning technique. The system can recognize human actions successfully when the camera of a robot is moving toward the target person from various directions. Therefore, the proposed method is useful for the vision system of indoor mobile robots. The system uses three types of information to recognize human actions, namely, information from color videos, optical flow videos, and depth videos. First, Kinect 2.0 captures color videos and depth videos simultaneously using its RGB camera and depth sensor. Second, the histogram of oriented gradient features is extracted from the color videos, and a support vector machine is used to detect the human region. Based on the detected human region, the frames of the color video are cropped and the corresponding frames of the optical flow video are obtained using the Farnebäck method (https://docs.opencv=.org/3.4/d4/dee/tutorial-optical-flow.html). The number of frames of these videos is then unified using a frame sampling technique. Subsequently, these three types of videos are input into three modified 3D convolutional neural networks (3D CNNs) separately. The modified 3D CNNs can extract the spatiotemporal features of human actions and recognize them. Finally, these recognition results are integrated to output the final recognition result of human actions. The proposed system can recognize 13 types of human actions, namely, drink (sit), drink (stand), eat (sit), eat (stand), read, sit down, stand up, use a computer, walk (horizontal), walk (straight), play with a phone/tablet, walk away from each other, and walk toward each other. The average human action recognition rate of 369 test human action videos was 96.4%, indicating that the proposed system is robust and efficient.

Original languageEnglish
Title of host publicationProceedings of 2019 2nd International Conference on Signal Processing and Machine Learning, SPML 2019
PublisherAssociation for Computing Machinery
Pages85-91
Number of pages7
ISBN (Electronic)9781450372213
DOIs
Publication statusPublished - 2019 Nov 27
Event2nd International Conference on Signal Processing and Machine Learning, SPML 2019 - Hangzhou, China
Duration: 2019 Nov 272019 Nov 29

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2nd International Conference on Signal Processing and Machine Learning, SPML 2019
Country/TerritoryChina
CityHangzhou
Period2019/11/272019/11/29

Keywords

  • 3D convolutional neural network
  • Color information
  • Deep learning
  • Depth information
  • Human action recognition
  • Indoor mobile robots
  • Moving camera
  • Optical flow
  • VGG net

ASJC Scopus subject areas

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Fingerprint

Dive into the research topics of 'A vision-based human action recognition system for moving cameras through deep learning'. Together they form a unique fingerprint.

Cite this