On the use of speaker-aware language model adaptation techniques for meeting speech recognition

Ying Wen Chen, Tien Hong Lo, Hsiu Jui Chang, Wei Cheng Chao, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper embarks on alleviating the problems caused by a multiple-speaker situation occurring frequently in a meeting for improved automatic speech recognition (ASR). There are a wide variety of ways for speakers to utter in the multiple-speaker situation. That is to say, people do not strictly follow the grammar when speaking and usually have a tendency to stutter while speaking, or often use personal idioms and some unique ways of speaking. Nevertheless, the existing language models employed in automatic transcription of meeting recordings rarely account for these facts but instead assume that all speakers participating in a meeting share the same speaking style or word-usage behavior. In turn, a single language model is built with all the manual transcripts of utterances compiled from multiple speakers that were taken holistically as the training set. To relax such an assumption, we endeavor to augment additional information cues into the training phase and the prediction phase of language modeling to accommodate the variety of speaker-related characteristics, through the process of speaker adaptation for language modeling. To this end, two disparate scenarios, i.e., "known speakers" and "unknown speakers," for the prediction phase are taken into consideration for developing methods to extract speaker-related information cues to aid in the training of language models. Extensive experiments respectively carried out on automatic transcription of Mandarin and English meeting recordings show that the proposed language models along with different mechanisms for speaker adaption achieve good performance gains in relation to the baseline neural network based language model compared in this study.

Original languageEnglish
Title of host publicationProceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018
EditorsChi-Chun Lee, Cheng-Zen Yang, Jen-Tzung Chien, Chen-Yu Chiang, Min-Yuh Day, Richard T.-H. Tsai, Hung-Yi Lee, Wen-Hsiang Lu, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages46-60
Number of pages15
ISBN (Electronic)9789869576918
Publication statusPublished - 2018 Oct 1
Externally publishedYes
Event30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018 - Hsinchu, Taiwan
Duration: 2018 Oct 42018 Oct 5

Publication series

NameProceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018

Conference

Conference30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018
CountryTaiwan
CityHsinchu
Period18/10/418/10/5

Keywords

  • Language modeling
  • Recurrent neural networks
  • Speaker adaptation
  • Speech recognition

ASJC Scopus subject areas

  • Speech and Hearing
  • Language and Linguistics

Fingerprint Dive into the research topics of 'On the use of speaker-aware language model adaptation techniques for meeting speech recognition'. Together they form a unique fingerprint.

  • Cite this

    Chen, Y. W., Lo, T. H., Chang, H. J., Chao, W. C., & Chen, B. (2018). On the use of speaker-aware language model adaptation techniques for meeting speech recognition. In C-C. Lee, C-Z. Yang, J-T. Chien, C-Y. Chiang, M-Y. Day, R. T-H. Tsai, H-Y. Lee, W-H. Lu, & S-H. Wu (Eds.), Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018 (pp. 46-60). (Proceedings of the 30th Conference on Computational Linguistics and Speech Processing, ROCLING 2018). The Association for Computational Linguistics and Chinese Language Processing (ACLCLP).