使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適

Ssu Cheng Chen, Hsiao Tsung Hung, Berlin Chen, Kuan Yu Chen

研究成果: 書貢獻/報告類型會議論文篇章

摘要

Research on deep learning has experienced a surge of interest in recent years. Alongside the rapid development of deep learning related technologies, various distributed representation methods have been proposed to embed the words of a vocabulary as vectors in a lower-dimensional space. Based on the distributed representations, it is anticipated to discover the semantic relationship between any pair of words via some kind of similarity computation of the associated word vectors. With the above background, this article explores a novel use of distributed representations of words for language modeling (LM) in speech recognition. Firstly, word vectors are employed to represent the words in the search history and the upcoming words during the speech recognition process, so as to dynamically adapt the language model on top of such vector representations. Second, we extend the recently proposed concept language model (CLM) by conduct relevant training data selection in the sentence level instead of the document level. By doing so, the concept classes of CLM can be more accurately estimated while simultaneously eliminating redundant or irrelevant information. On the other hand, since the resulting concept classes need to be dynamically selected and linearly combined to form the CLM model during the speech recognition process, we determine the relatedness of each concept class to the test utterance based the word representations derived with either the continue bag-of-words model (CBOW) or the skip-gram model (Skip-gram). Finally, we also combine the above LM methods for better speech recognition performance. Extensive experiments carried out on the MATBN (Mandarin Across Taiwan Broadcast News) corpus demonstrate the utility of our proposed LM methods in relation to several well-practiced baselines.

貢獻的翻譯標題Exploring word embedding and concept information for language model adaptation in Mandarin large vocabulary continuous speech recognition
原文繁體中文
主出版物標題Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
編輯Sin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
發行者The Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
頁面4-17
頁數14
ISBN(電子)9789573079286
出版狀態已發佈 - 2015 十月 1
對外發佈Yes
事件27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, 臺灣
持續時間: 2015 十月 12015 十月 2

出版系列

名字Proceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

會議

會議27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
國家臺灣
城市Hsinchu
期間2015/10/012015/10/02

Keywords

  • Concept language model
  • Deep learning
  • Language modeling
  • Speech recognition
  • Word representation

ASJC Scopus subject areas

  • Speech and Hearing
  • Language and Linguistics

指紋 深入研究「使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適」主題。共同形成了獨特的指紋。

引用此