TY - JOUR
T1 - Assessing creative problem-solving with automated text grading
AU - Wang, Hao Chuan
AU - Chang, Chun Yen
AU - Li, Tsai Yen
PY - 2008/12
Y1 - 2008/12
N2 - The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit students' constructed responses are beneficial. But the high cost required in manually grading constructed responses could become an obstacle in applying open-ended questions. In this study, automated grading schemes have been developed and evaluated in the context of secondary Earth science education. Empirical evaluations revealed that the automated grading schemes may reliably identify domain concepts embedded in students' natural language responses with satisfactory inter-coder agreement against human coding in two sub-tasks of the test (Cohen's Kappa = .65-.72). And when a single holistic score was computed for each student, machine-generated scores achieved high inter-rater reliability against human grading (Pearson's r = .92). The reliable performance in automatic concept identification and numeric grading demonstrates the potential of using automated grading to support the use of open-ended questions in science assessments and enable new technologies for science learning.
AB - The work aims to improve the assessment of creative problem-solving in science education by employing language technologies and computational-statistical machine learning methods to grade students' natural language responses automatically. To evaluate constructs like creative problem-solving with validity, open-ended questions that elicit students' constructed responses are beneficial. But the high cost required in manually grading constructed responses could become an obstacle in applying open-ended questions. In this study, automated grading schemes have been developed and evaluated in the context of secondary Earth science education. Empirical evaluations revealed that the automated grading schemes may reliably identify domain concepts embedded in students' natural language responses with satisfactory inter-coder agreement against human coding in two sub-tasks of the test (Cohen's Kappa = .65-.72). And when a single holistic score was computed for each student, machine-generated scores achieved high inter-rater reliability against human grading (Pearson's r = .92). The reliable performance in automatic concept identification and numeric grading demonstrates the potential of using automated grading to support the use of open-ended questions in science assessments and enable new technologies for science learning.
KW - Automated grading
KW - Computer-aided assessment
KW - Creative problem-solving
KW - Machine learning application
KW - Science learning assessment
UR - http://www.scopus.com/inward/record.url?scp=49449115241&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=49449115241&partnerID=8YFLogxK
U2 - 10.1016/j.compedu.2008.01.006
DO - 10.1016/j.compedu.2008.01.006
M3 - Article
AN - SCOPUS:49449115241
SN - 0360-1315
VL - 51
SP - 1450
EP - 1466
JO - Computers and Education
JF - Computers and Education
IS - 4
ER -