Enhancing Automatic Speech Assessment Leveraging Heterogeneous Features and Soft Labels For Ordinal Classification

  • Wen Hsuan Peng*
  • , Sally Chen
  • , Berlin Chen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

The general goal of automated speech assessment (ASA) is to provide a consistent and objective evaluation on the spoken language proficiency of an L2 learner or test-taker. In contrast to most previous work that treats ASA as a nominal multi-classification task and thus neglects the sequential nature of proficiency grades, this paper explores the notion of soft labels for use in ASA. In particular, we strive to enhance ASA performance by examining two critical issues: (1) the impact of applying soft labels instead of hard labels in the optimization of ordinal classification for ASA, and (2) the effects of combining self-supervised learning (SSL) with handcrafted indicator features via a novel modeling paradigm. Our results demonstrate that the proposed model can considerably enhance performance compared to existing strong baselines. The improvement is evident not only in the test dataset of seen prompts but also in those of unseen prompts, suggesting the robust generalization and adaptability of our method.

Original languageEnglish
Title of host publicationProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages945-952
Number of pages8
ISBN (Electronic)9798350392258
DOIs
Publication statusPublished - 2024
Event2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China
Duration: 2024 Dec 22024 Dec 5

Publication series

NameProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/TerritoryChina
CityMacao
Period2024/12/022024/12/05

Keywords

  • Automated speech assessment
  • End-to-end neural network
  • Multi-modal model

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Enhancing Automatic Speech Assessment Leveraging Heterogeneous Features and Soft Labels For Ordinal Classification'. Together they form a unique fingerprint.

Cite this