This paper presented a virtual-real synthesis image composition method, called match-moving, which is suitable for the graphic image composition or the stereoscopic image composition. The proposed method consists of two subsystems for camera tracking and virtue-real preview, respectively. The camera tracking subsystem based on the temporal depth fusion is used to acquire the camera pose and trajectory. In order to compensate the existing noises in dynamic scenes, a priori model is employed while developing a human skeleton detection method and a spatial temporal attention analysis method. The composition of real and virtual objects is performed by a real-time virtual-real synthesis preview system. In exemplary results, the measurement of camera pose has shown that the proposed system was achieved with high accuracy. Our goal is to design an intuitive camera match-moving system to perform a virtual-real synthesis with real-time 3D preview visualization.