使用詞向量表示與概念資訊於中文大詞彙連續語音辨識之語言模型調適

Translated title of the contribution: Exploring word embedding and concept information for language model adaptation in Mandarin large vocabulary continuous speech recognition

Ssu Cheng Chen, Hsiao Tsung Hung, Berlin Chen, Kuan Yu Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Research on deep learning has experienced a surge of interest in recent years. Alongside the rapid development of deep learning related technologies, various distributed representation methods have been proposed to embed the words of a vocabulary as vectors in a lower-dimensional space. Based on the distributed representations, it is anticipated to discover the semantic relationship between any pair of words via some kind of similarity computation of the associated word vectors. With the above background, this article explores a novel use of distributed representations of words for language modeling (LM) in speech recognition. Firstly, word vectors are employed to represent the words in the search history and the upcoming words during the speech recognition process, so as to dynamically adapt the language model on top of such vector representations. Second, we extend the recently proposed concept language model (CLM) by conduct relevant training data selection in the sentence level instead of the document level. By doing so, the concept classes of CLM can be more accurately estimated while simultaneously eliminating redundant or irrelevant information. On the other hand, since the resulting concept classes need to be dynamically selected and linearly combined to form the CLM model during the speech recognition process, we determine the relatedness of each concept class to the test utterance based the word representations derived with either the continue bag-of-words model (CBOW) or the skip-gram model (Skip-gram). Finally, we also combine the above LM methods for better speech recognition performance. Extensive experiments carried out on the MATBN (Mandarin Across Taiwan Broadcast News) corpus demonstrate the utility of our proposed LM methods in relation to several well-practiced baselines.

Translated title of the contributionExploring word embedding and concept information for language model adaptation in Mandarin large vocabulary continuous speech recognition
Original languageChinese (Traditional)
Title of host publicationProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
EditorsSin-Horng Chen, Hsin-Min Wang, Jen-Tzung Chien, Hung-Yu Kao, Wen-Whei Chang, Yih-Ru Wang, Shih-Hung Wu
PublisherThe Association for Computational Linguistics and Chinese Language Processing (ACLCLP)
Pages4-17
Number of pages14
ISBN (Electronic)9789573079286
Publication statusPublished - 2015 Oct 1
Event27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015 - Hsinchu, Taiwan
Duration: 2015 Oct 12015 Oct 2

Publication series

NameProceedings of the 27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015

Conference

Conference27th Conference on Computational Linguistics and Speech Processing, ROCLING 2015
Country/TerritoryTaiwan
CityHsinchu
Period2015/10/012015/10/02

ASJC Scopus subject areas

  • Speech and Hearing
  • Language and Linguistics

Fingerprint

Dive into the research topics of 'Exploring word embedding and concept information for language model adaptation in Mandarin large vocabulary continuous speech recognition'. Together they form a unique fingerprint.

Cite this