This research proposes a vision-based online human action recognition system. This system uses deep learning methods to recognise human action under moving camera circumstances. The proposed system consists of five stages: human detection, human tracking, feature extraction, action classification and fusion. The system uses three kinds of input information: colour intensity, short-term dynamic information and skeletal joints. In the human detection stage, a two-dimensional (2D) pose estimator method is used to detect a human. In the human tracking stage, a deep SORT tracking method is used to track the human. In the feature extraction stage, three kinds of features, spatial, temporal and structural, are extracted to analyse human actions. In the action classification stage, three kinds of features of human actions are respectively classified by three kinds of long short-term memory (LSTM) classifiers. In the fusion stage, a fusion method is used to leverage the three output results from the LSTM classifiers. This study constructs a computer vision and image understanding (CVIU) Moving Camera Human Action dataset (CVIU dataset), containing 3, 646 human action sequences, including 11 types of single human actions and 5 types of interactive human actions. This dataset was used to train and evaluate the proposed system. Experimental results showed that the recognition rates of spatial features, temporal features and structural features were 96.64%, 81.87% and 68.10%, respectively. Finally, the fusion result of human action recognition for indoor smart mobile robots in this study was 96.84%.