Automated estimation of item difficulty for multiple-choice tests: An application of word embedding techniques

Fu Yuan Hsu, Hahn Ming Lee, Tao Hsing Chang, Yao-Ting Sung

Research output: Contribution to journalArticle

Abstract

Pretesting is the most commonly used method for estimating test item difficulty because it provides highly accurate results that can be applied to assessment development activities. However, pretesting is inefficient, and it can lead to item exposure. Hence, an increasing number of studies have invested considerable effort in researching the automated estimation of item difficulty. Language proficiency tests constitute the majority of researched test topics, while comparatively less research has focused on content subjects. This paper introduces a novel method for the automated estimation of item difficulty for social studies tests. In this study, we explore the difficulty of multiple-choice items, which consist of the following item elements: a question and alternative options. We use learning materials to construct a semantic space using word embedding techniques and project an item's texts into the semantic space to obtain corresponding vectors. Semantic features are obtained by calculating the cosine similarity between the vectors of item elements. Subsequently, these semantic features are sent to a classifier for training and testing. Based on the output of the classifier, an estimation model is created and item difficulty is estimated. Our findings suggest that the semantic similarity between a stem and the options has the strongest impact on item difficulty. Furthermore, the results indicate that the proposed estimation method outperforms pretesting, and therefore, we expect that the proposed approach will complement and partially replace pretesting in future.

LanguageEnglish
Pages969-984
Number of pages16
JournalInformation Processing and Management
Volume54
Issue number6
DOIs
Publication statusPublished - 2018 Nov 1

Fingerprint

Semantics
semantics
Classifiers
social studies
Testing
language
learning
Classifier

Keywords

  • Cognitive processing model
  • Item difficulty estimation
  • Machine learning
  • Multiple-choice item
  • Semantic similarity
  • Word embedding

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this

Automated estimation of item difficulty for multiple-choice tests : An application of word embedding techniques. / Hsu, Fu Yuan; Lee, Hahn Ming; Chang, Tao Hsing; Sung, Yao-Ting.

In: Information Processing and Management, Vol. 54, No. 6, 01.11.2018, p. 969-984.

Research output: Contribution to journalArticle

@article{b1f7d43374494b65b79e51e2ae69cbfa,
title = "Automated estimation of item difficulty for multiple-choice tests: An application of word embedding techniques",
abstract = "Pretesting is the most commonly used method for estimating test item difficulty because it provides highly accurate results that can be applied to assessment development activities. However, pretesting is inefficient, and it can lead to item exposure. Hence, an increasing number of studies have invested considerable effort in researching the automated estimation of item difficulty. Language proficiency tests constitute the majority of researched test topics, while comparatively less research has focused on content subjects. This paper introduces a novel method for the automated estimation of item difficulty for social studies tests. In this study, we explore the difficulty of multiple-choice items, which consist of the following item elements: a question and alternative options. We use learning materials to construct a semantic space using word embedding techniques and project an item's texts into the semantic space to obtain corresponding vectors. Semantic features are obtained by calculating the cosine similarity between the vectors of item elements. Subsequently, these semantic features are sent to a classifier for training and testing. Based on the output of the classifier, an estimation model is created and item difficulty is estimated. Our findings suggest that the semantic similarity between a stem and the options has the strongest impact on item difficulty. Furthermore, the results indicate that the proposed estimation method outperforms pretesting, and therefore, we expect that the proposed approach will complement and partially replace pretesting in future.",
keywords = "Cognitive processing model, Item difficulty estimation, Machine learning, Multiple-choice item, Semantic similarity, Word embedding",
author = "Hsu, {Fu Yuan} and Lee, {Hahn Ming} and Chang, {Tao Hsing} and Yao-Ting Sung",
year = "2018",
month = "11",
day = "1",
doi = "10.1016/j.ipm.2018.06.007",
language = "English",
volume = "54",
pages = "969--984",
journal = "Information Processing and Management",
issn = "0306-4573",
publisher = "Elsevier Limited",
number = "6",

}

TY - JOUR

T1 - Automated estimation of item difficulty for multiple-choice tests

T2 - Information Processing and Management

AU - Hsu, Fu Yuan

AU - Lee, Hahn Ming

AU - Chang, Tao Hsing

AU - Sung, Yao-Ting

PY - 2018/11/1

Y1 - 2018/11/1

N2 - Pretesting is the most commonly used method for estimating test item difficulty because it provides highly accurate results that can be applied to assessment development activities. However, pretesting is inefficient, and it can lead to item exposure. Hence, an increasing number of studies have invested considerable effort in researching the automated estimation of item difficulty. Language proficiency tests constitute the majority of researched test topics, while comparatively less research has focused on content subjects. This paper introduces a novel method for the automated estimation of item difficulty for social studies tests. In this study, we explore the difficulty of multiple-choice items, which consist of the following item elements: a question and alternative options. We use learning materials to construct a semantic space using word embedding techniques and project an item's texts into the semantic space to obtain corresponding vectors. Semantic features are obtained by calculating the cosine similarity between the vectors of item elements. Subsequently, these semantic features are sent to a classifier for training and testing. Based on the output of the classifier, an estimation model is created and item difficulty is estimated. Our findings suggest that the semantic similarity between a stem and the options has the strongest impact on item difficulty. Furthermore, the results indicate that the proposed estimation method outperforms pretesting, and therefore, we expect that the proposed approach will complement and partially replace pretesting in future.

AB - Pretesting is the most commonly used method for estimating test item difficulty because it provides highly accurate results that can be applied to assessment development activities. However, pretesting is inefficient, and it can lead to item exposure. Hence, an increasing number of studies have invested considerable effort in researching the automated estimation of item difficulty. Language proficiency tests constitute the majority of researched test topics, while comparatively less research has focused on content subjects. This paper introduces a novel method for the automated estimation of item difficulty for social studies tests. In this study, we explore the difficulty of multiple-choice items, which consist of the following item elements: a question and alternative options. We use learning materials to construct a semantic space using word embedding techniques and project an item's texts into the semantic space to obtain corresponding vectors. Semantic features are obtained by calculating the cosine similarity between the vectors of item elements. Subsequently, these semantic features are sent to a classifier for training and testing. Based on the output of the classifier, an estimation model is created and item difficulty is estimated. Our findings suggest that the semantic similarity between a stem and the options has the strongest impact on item difficulty. Furthermore, the results indicate that the proposed estimation method outperforms pretesting, and therefore, we expect that the proposed approach will complement and partially replace pretesting in future.

KW - Cognitive processing model

KW - Item difficulty estimation

KW - Machine learning

KW - Multiple-choice item

KW - Semantic similarity

KW - Word embedding

UR - http://www.scopus.com/inward/record.url?scp=85049610755&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85049610755&partnerID=8YFLogxK

U2 - 10.1016/j.ipm.2018.06.007

DO - 10.1016/j.ipm.2018.06.007

M3 - Article

VL - 54

SP - 969

EP - 984

JO - Information Processing and Management

JF - Information Processing and Management

SN - 0306-4573

IS - 6

ER -