TY - GEN
T1 - An Effective Automated Speaking Assessment Approach to Mitigating Data Scarcity and Imbalanced Distribution
AU - Lo, Tien Hong
AU - Chao, Fu An
AU - Wu, Tzu I.
AU - Sung, Yao Ting
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss re-weighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.
AB - Automated speaking assessment (ASA) typically involves automatic speech recognition (ASR) and hand-crafted feature extraction from the ASR transcript of a learner's speech. Recently, self-supervised learning (SSL) has shown stellar performance compared to traditional methods. However, SSL-based ASA systems are faced with at least three data-related challenges: limited annotated data, uneven distribution of learner proficiency levels and non-uniform score intervals between different CEFR proficiency levels. To address these challenges, we explore the use of two novel modeling strategies: metric-based classification and loss re-weighting, leveraging distinct SSL-based embedding features. Extensive experimental results on the ICNALE benchmark dataset suggest that our approach can outperform existing strong baselines by a sizable margin, achieving a significant improvement of more than 10% in CEFR prediction accuracy.
UR - http://www.scopus.com/inward/record.url?scp=85197880113&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85197880113&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85197880113
T3 - Findings of the Association for Computational Linguistics: NAACL 2024 - Findings
SP - 1352
EP - 1362
BT - Findings of the Association for Computational Linguistics
A2 - Duh, Kevin
A2 - Gomez, Helena
A2 - Bethard, Steven
PB - Association for Computational Linguistics (ACL)
T2 - 2024 Findings of the Association for Computational Linguistics: NAACL 2024
Y2 - 16 June 2024 through 21 June 2024
ER -