A method of evaluating automatic short answer markers (ASAM) with multiraters has been proposed. Three indexes including mean prediction bias (MPB), prediction-bias change with scores (PBCS) and PBCD have been suggested to analyze systems' performance in detail. The first is to look at the direction of bias instead of only the size of error. The second is to look at the system performance at each score instead of only overall performance. The third is to look at the relationship between the system performance and the rating deviation instead of wasting the information provided by the average scores. The fourth is to look at the performance sensitivity by regression analysis instead of only employing qualitative analysis. Moreover, the evaluation points that the first priority of improving our single-word based system is to decrease the prediction error at low and high scores. The analysis reveals that many low- and high-score responses are misclassified as middle scores.
ASJC Scopus subject areas