On the use of speaker-aware language model adaptation techniques for meeting speech recognition

Ying Wen Chen, Tien Hong Lo, Hsiu Jui Chang, Wei Cheng Chao, Berlin Chen

研究成果: 書貢獻/報告類型會議論文篇章

摘要

This paper embarks on alleviating the problems caused by a multiple-speaker situation occurring frequently in a meeting for improved automatic speech recognition (ASR). There are a wide variety of ways for speakers to utter in the multiple-speaker situation. That is to say, people do not strictly follow the grammar when speaking and usually have a tendency to stutter while speaking, or often use personal idioms and some unique ways of speaking. Nevertheless, the existing language models employed in automatic transcription of meeting recordings rarely account for these facts but instead assume that all speakers participating in a meeting share the same speaking style or word-usage behavior. In turn, a single language model is built with all the manual transcripts of utterances compiled from multiple speakers that were taken holistically as the training set. To relax such an assumption, we endeavor to augment additional information cues into the training phase and the prediction phase of language modeling to accommodate the variety of speaker-related characteristics, through the process of speaker adaptation for language modeling. To this end, two disparate scenarios, i.e., "known speakers" and "unknown speakers," for the prediction phase are taken into consideration for developing methods to extract speaker-related information cues to aid in the training of language models. Extensive experiments respectively carried out on automatic transcription of Mandarin and English meeting recordings show that the proposed language models along with different mechanisms for speaker adaption achieve good performance gains in relation to the baseline neural network based language model compared in this study.

原文英語
主出版物標題Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018
編輯Chi-Chun Lee, Cheng-Zen Yang, Jen-Tzung Chien, Chen-Yu Chiang, Min-Yuh Day, Richard T.-H. Tsai, Hung-Yi Lee, Wen-Hsiang Lu, Shih-Hung Wu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面46-60
頁數15
ISBN(電子)9789869576918
出版狀態已發佈 - 2018 十月 1
對外發佈Yes
事件30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018 - Hsinchu, 臺灣
持續時間: 2018 十月 42018 十月 5

出版系列

名字Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018

會議

會議30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018
國家臺灣
城市Hsinchu
期間2018/10/042018/10/05

ASJC Scopus subject areas

  • Speech and Hearing
  • Language and Linguistics

指紋 深入研究「On the use of speaker-aware language model adaptation techniques for meeting speech recognition」主題。共同形成了獨特的指紋。

引用此