This study presents an infant emotion recognition system using visual and audio information for infants aged 1 to 7 months. The system is divided into two parts, image processing and speech processing. The image processing part detects the infant's face and extracts facial expressions features. In the face detection stage, the system selects the largest skin color region as the facial area, while in the facial expressions feature extraction stage, the system uses the local ternary pattern (LTP) technology to label facial contours and calculates their corresponding Zernike moments. In the speech processing part, the system uses common mel-frequency cepstral coefficients (MFCCs) and its delta cepstrum coefficients as vocalization features. Finally, the system uses support vector machines (SVMs) to classify the facial expressions features and vocalization features, respectively. By combining these types of classification results, the system reaches a decision about the infant's emotion. The average recognition rate of infant emotions is 85.3% in the experiments which, in our view, makes the proposed system robust and efficient.