TY - GEN
T1 - Enhancing Automatic Speech Assessment Leveraging Heterogeneous Features and Soft Labels For Ordinal Classification
AU - Peng, Wen Hsuan
AU - Chen, Sally
AU - Chen, Berlin
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The general goal of automated speech assessment (ASA) is to provide a consistent and objective evaluation on the spoken language proficiency of an L2 learner or test-taker. In contrast to most previous work that treats ASA as a nominal multi-classification task and thus neglects the sequential nature of proficiency grades, this paper explores the notion of soft labels for use in ASA. In particular, we strive to enhance ASA performance by examining two critical issues: (1) the impact of applying soft labels instead of hard labels in the optimization of ordinal classification for ASA, and (2) the effects of combining self-supervised learning (SSL) with handcrafted indicator features via a novel modeling paradigm. Our results demonstrate that the proposed model can considerably enhance performance compared to existing strong baselines. The improvement is evident not only in the test dataset of seen prompts but also in those of unseen prompts, suggesting the robust generalization and adaptability of our method.
AB - The general goal of automated speech assessment (ASA) is to provide a consistent and objective evaluation on the spoken language proficiency of an L2 learner or test-taker. In contrast to most previous work that treats ASA as a nominal multi-classification task and thus neglects the sequential nature of proficiency grades, this paper explores the notion of soft labels for use in ASA. In particular, we strive to enhance ASA performance by examining two critical issues: (1) the impact of applying soft labels instead of hard labels in the optimization of ordinal classification for ASA, and (2) the effects of combining self-supervised learning (SSL) with handcrafted indicator features via a novel modeling paradigm. Our results demonstrate that the proposed model can considerably enhance performance compared to existing strong baselines. The improvement is evident not only in the test dataset of seen prompts but also in those of unseen prompts, suggesting the robust generalization and adaptability of our method.
KW - Automated speech assessment
KW - End-to-end neural network
KW - Multi-modal model
UR - https://www.scopus.com/pages/publications/85215664430
UR - https://www.scopus.com/pages/publications/85215664430#tab=citedBy
U2 - 10.1109/SLT61566.2024.10832269
DO - 10.1109/SLT61566.2024.10832269
M3 - Conference contribution
AN - SCOPUS:85215664430
T3 - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
SP - 945
EP - 952
BT - Proceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 IEEE Spoken Language Technology Workshop, SLT 2024
Y2 - 2 December 2024 through 5 December 2024
ER -