This study proposes a vision-based infant monitoring system. The input videos are obtained from one PT(Pan Tilt) IP(Internet Protocol) camera set on a high point in a room. The proposed system consists of three major stages: tracking object (infant) initialization, infant tracking, and PT IP camera control. First, a codebook background subtraction algorithm is applied to extract the infant on the input frames and to construct a tracking feature model initially. Once the infant has been extracted, the system then tracks the infant by using the tracking feature model. Since the infants are not rigid objects, the tracking feature model should be updated automatically when the infants change their postures or positions. Moreover, the system also predicts infant actions and controls the PT IP camera movement to avoid the infant crawling or walking out of the monitoring scope. Finally, a retracking mechanism is used to retrack the infant to avoid mistracking. These strategies help the monitoring system works well. We hope the proposed method can be embedded into some healthcare systems in the future.