Proposing ways of evaluating automatic short-answer markers with multiraters

Che Di Lee, Tsung Hau Jen, Hsieh Hai Fu, Chun Yen Chang

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

A method of evaluating automatic short answer markers (ASAM) with multiraters has been proposed. Three indexes including mean prediction bias (MPB), prediction-bias change with scores (PBCS) and PBCD have been suggested to analyze systems' performance in detail. The first is to look at the direction of bias instead of only the size of error. The second is to look at the system performance at each score instead of only overall performance. The third is to look at the relationship between the system performance and the rating deviation instead of wasting the information provided by the average scores. The fourth is to look at the performance sensitivity by regression analysis instead of only employing qualitative analysis. Moreover, the evaluation points that the first priority of improving our single-word based system is to decrease the prediction error at low and high scores. The analysis reveals that many low- and high-score responses are misclassified as middle scores.

Original languageEnglish
JournalBritish Journal of Educational Technology
Volume43
Issue number3
DOIs
Publication statusPublished - 2012 May 1

Fingerprint

performance
trend
regression analysis
rating
evaluation

ASJC Scopus subject areas

  • Education

Cite this

Proposing ways of evaluating automatic short-answer markers with multiraters. / Lee, Che Di; Jen, Tsung Hau; Fu, Hsieh Hai; Chang, Chun Yen.

In: British Journal of Educational Technology, Vol. 43, No. 3, 01.05.2012.

Research output: Contribution to journalArticle

@article{187d1f9205da4ab29006c8209f6984fb,
title = "Proposing ways of evaluating automatic short-answer markers with multiraters",
abstract = "A method of evaluating automatic short answer markers (ASAM) with multiraters has been proposed. Three indexes including mean prediction bias (MPB), prediction-bias change with scores (PBCS) and PBCD have been suggested to analyze systems' performance in detail. The first is to look at the direction of bias instead of only the size of error. The second is to look at the system performance at each score instead of only overall performance. The third is to look at the relationship between the system performance and the rating deviation instead of wasting the information provided by the average scores. The fourth is to look at the performance sensitivity by regression analysis instead of only employing qualitative analysis. Moreover, the evaluation points that the first priority of improving our single-word based system is to decrease the prediction error at low and high scores. The analysis reveals that many low- and high-score responses are misclassified as middle scores.",
author = "Lee, {Che Di} and Jen, {Tsung Hau} and Fu, {Hsieh Hai} and Chang, {Chun Yen}",
year = "2012",
month = "5",
day = "1",
doi = "10.1111/j.1467-8535.2011.01273.x",
language = "English",
volume = "43",
journal = "British Journal of Educational Technology",
issn = "0007-1013",
publisher = "Wiley-Blackwell",
number = "3",

}

TY - JOUR

T1 - Proposing ways of evaluating automatic short-answer markers with multiraters

AU - Lee, Che Di

AU - Jen, Tsung Hau

AU - Fu, Hsieh Hai

AU - Chang, Chun Yen

PY - 2012/5/1

Y1 - 2012/5/1

N2 - A method of evaluating automatic short answer markers (ASAM) with multiraters has been proposed. Three indexes including mean prediction bias (MPB), prediction-bias change with scores (PBCS) and PBCD have been suggested to analyze systems' performance in detail. The first is to look at the direction of bias instead of only the size of error. The second is to look at the system performance at each score instead of only overall performance. The third is to look at the relationship between the system performance and the rating deviation instead of wasting the information provided by the average scores. The fourth is to look at the performance sensitivity by regression analysis instead of only employing qualitative analysis. Moreover, the evaluation points that the first priority of improving our single-word based system is to decrease the prediction error at low and high scores. The analysis reveals that many low- and high-score responses are misclassified as middle scores.

AB - A method of evaluating automatic short answer markers (ASAM) with multiraters has been proposed. Three indexes including mean prediction bias (MPB), prediction-bias change with scores (PBCS) and PBCD have been suggested to analyze systems' performance in detail. The first is to look at the direction of bias instead of only the size of error. The second is to look at the system performance at each score instead of only overall performance. The third is to look at the relationship between the system performance and the rating deviation instead of wasting the information provided by the average scores. The fourth is to look at the performance sensitivity by regression analysis instead of only employing qualitative analysis. Moreover, the evaluation points that the first priority of improving our single-word based system is to decrease the prediction error at low and high scores. The analysis reveals that many low- and high-score responses are misclassified as middle scores.

UR - http://www.scopus.com/inward/record.url?scp=84859889303&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84859889303&partnerID=8YFLogxK

U2 - 10.1111/j.1467-8535.2011.01273.x

DO - 10.1111/j.1467-8535.2011.01273.x

M3 - Article

VL - 43

JO - British Journal of Educational Technology

JF - British Journal of Educational Technology

SN - 0007-1013

IS - 3

ER -