In this paper, a merged convolution neural network (MCNN) is proposed to improve the accuracy and robustness of real-time facial expression recognition (FER). Although there are many ways to improve the performance of facial expression recognition, a revamp of the training framework and image preprocessing renders better results in applications. When the camera is capturing images at high speed, however, changes in image characteristics may occur at certain moments due to the influence of light and other factors. Such changes can result in incorrect recognition of human facial expression. To solve this problem, we propose a statistical method for recognition results obtained from previous images, instead of using the current recognition output. Experimental results show that the proposed method can satisfactorily recognize seven basic facial expressions in real time.