This paper aims to explore the way of evaluating the automatic text grader for open-ended questions by considering the relationships among raters, grade levels, and prediction errors. The open-ended question in this study was about aurora and required knowledge of earth science and physics. Each student's response was graded from 0 to 10 points by three raters. The automatic grading systems were designed as support-vector-machine regression models with linear, quadratic, and RBF kernel respectively. The three kinds of regression models were separately trained through grades by three human raters and the average grades. The preliminary evaluation with 391 students' data shows results as the following: (1) The higher the grade-level is, the larger the prediction error is. (2) The ranks of prediction errors of human-rater-trained models at three grade levels are different. (3) The model trained through the average grades has the best performance at all three grade-levels no matter what the kind of kernel is. These results suggest that examining the prediction errors of models in detail on different grade-levels is worthwhile for finding the best matching between raters' grades and models.