探究語言模型合併策略應用於中英文語碼轉換語音辨識

Wei Ting Lin, Berlin Chen

研究成果: 書貢獻/報告類型會議論文篇章

2 引文 斯高帕斯(Scopus)

摘要

Code-switching (CS) speech is a common language phenomenon in multilingual societies. For example, the official language in Taiwan is Mandarin Chinese, but the daily conversations of the ordinary populace are often mingled with English words, phrases or sentences. It is generally agreed that transcription of CS speech remains an important challenge for the current development of automatic speech recognition (ASR). One of the straightforward and feasible ways to promote the efficacy of CS ASR is to improve the language model (LM) involved in ASR. Given these observations, we put forward disparate strategies that conduct combination of various language models at different stages of the ASR process. Our experimental configuration consists of two CS (i.e., mixing of Mandarin Chinese and English) language models and one monolingual (i.e. Mandarin Chinese) language models, where the two CS language models are domain-specific and the monolingual language model is trained on a general text collection. Through the language model combination at different stages of the ASR process, we purport to know if the ASR system could integrate the strengths of various language models to achieve improved performance across different tasks. More specifically, three strategies for combining language models are investigated, namely simple N-gram language model combination, decoding graph combination and word lattice combination. A series of ASR experiments conduct on CS speech corpora complied from different industrial application scenarios have confirm the utility of the aforementioned LM combination strategies.

貢獻的翻譯標題Exploring Disparate Language Model Combination Strategies for Mandarin-English Code-Switching ASR
原文繁體中文
主出版物標題ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing
編輯Jenq-Haur Wang, Ying-Hui Lai, Lung-Hao Lee, Kuan-Yu Chen, Hung-Yi Lee, Chi-Chun Lee, Syu-Siang Wang, Hen-Hsen Huang, Chuan-Ming Liu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面346-358
頁數13
ISBN(電子)9789869576932
出版狀態已發佈 - 2020
事件32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020 - Taipei, 臺灣
持續時間: 2020 9月 242020 9月 26

出版系列

名字ROCLING 2020 - 32nd Conference on Computational Linguistics and Speech Processing

會議

會議32nd Conference on Computational Linguistics and Speech Processing, ROCLING 2020
國家/地區臺灣
城市Taipei
期間2020/09/242020/09/26

Keywords

  • automatic speech recognition
  • code-switching
  • decoding graph
  • language model
  • word lattice

ASJC Scopus subject areas

  • 語言與語言學
  • 言語和聽力

指紋

深入研究「探究語言模型合併策略應用於中英文語碼轉換語音辨識」主題。共同形成了獨特的指紋。

引用此