Layer-Wise Feature Distillation with Unsupervised Multi-Aspect Optimization for Improved Automatic Speech Assessment

  • Chung Wen Wu*
  • , Berlin Chen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Self-supervised features have shown promising progress across several domains. In Automatic Speech Assessment (ASA), SSL features have been widely utilized in recent research. However, few studies have dedicated efforts to explore the layer-wise features in pre-trained SSL models. Another key challenge in ASA is the high cost of labeling various aspects of speech proficiency, such as content relevance, delivery, and language use. In this paper, we propose three unsupervised subtasks to assist model training in ASA and examine the importance of embeddings from each layer of the acoustic model for various aspects. This provides preliminary research in this area. Extensive experiments demonstrate that model training with our tailored subtasks achieves superior performance in speech proficiency assessment tasks.

Original languageEnglish
Title of host publicationAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350367331
DOIs
Publication statusPublished - 2024
Event2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024 - Macau, China
Duration: 2024 Dec 32024 Dec 6

Publication series

NameAPSIPA ASC 2024 - Asia Pacific Signal and Information Processing Association Annual Summit and Conference 2024

Conference

Conference2024 Asia Pacific Signal and Information Processing Association Annual Summit and Conference, APSIPA ASC 2024
Country/TerritoryChina
CityMacau
Period2024/12/032024/12/06

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Hardware and Architecture
  • Signal Processing

Fingerprint

Dive into the research topics of 'Layer-Wise Feature Distillation with Unsupervised Multi-Aspect Optimization for Improved Automatic Speech Assessment'. Together they form a unique fingerprint.

Cite this