Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection

  • Tzu Ting Yang*
  • , Hsin Wei Wang
  • , Yi Cheng Wang
  • , Berlin Chen
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Code-switching - where multilingual speakers alternately switch between languages during conversations - still poses significant challenges to end-to-end (E2E) automatic speech recognition (ASR) systems due to phenomena of both acoustic and semantic confusion. This issue arises because ASR systems struggle to handle the rapid alternation of languages effectively, which often leads to significant performance degradation. Our main contributions are at least threefold: First, we incorporate language identification (LID) information into several intermediate layers of the encoder, aiming to enrich output embeddings with more detailed language information. Secondly, through the novel application of language boundary alignment loss, the subsequent ASR modules are enabled to more effectively utilize the knowledge of internal language posteriors. Third, we explore the feasibility of using language posteriors to facilitate deep interaction between shared encoder and language-specific encoders. Through comprehensive experiments on the SEAME corpus, we have verified that our proposed method outperforms the prior-art method, disentangle based mixture-of-experts (D-MoE), further enhancing the acuity of the encoder to languages.

Original languageEnglish
Title of host publicationProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages476-481
Number of pages6
ISBN (Electronic)9798350392258
DOIs
Publication statusPublished - 2024
Event2024 IEEE Spoken Language Technology Workshop, SLT 2024 - Macao, China
Duration: 2024 Dec 22024 Dec 5

Publication series

NameProceedings of 2024 IEEE Spoken Language Technology Workshop, SLT 2024

Conference

Conference2024 IEEE Spoken Language Technology Workshop, SLT 2024
Country/TerritoryChina
CityMacao
Period2024/12/022024/12/05

Keywords

  • automatic speech recognition
  • Code-switching
  • intermediate CTC loss
  • non-peaky CTC loss

ASJC Scopus subject areas

  • Computer Vision and Pattern Recognition
  • Hardware and Architecture
  • Media Technology
  • Instrumentation
  • Linguistics and Language

Fingerprint

Dive into the research topics of 'Enhancing Code-Switching ASR Leveraging Non-Peaky CTC Loss and Deep Language Posterior Injection'. Together they form a unique fingerprint.

Cite this