TY - JOUR
T1 - Incorporating proximity information for relevance language modeling in speech recognition
AU - Chen, Yi Wen
AU - Hao, Bo Han
AU - Chen, Kuan Yu
AU - Chen, Berlin
PY - 2013
Y1 - 2013
N2 - Language modeling (LM), aiming to provide a statistical mechanism to associate quantitative scores to sequences of words, has long been an interesting yet challenging problem in the field of speech and language processing. Although the ngram model remains the predominant one, a number of disparate LM methods have been developed to complement the n-gram model. Among them, relevance modeling (RM) that explores the relevance information inherent between the search history and an upcoming word has shown preliminary promise for dynamic language model adaptation. This paper continues this general line of research in two significant aspects. First, the so-called "bag-of- words" assumption of RM is relaxed by incorporating word proximity evidence into the RM formulation. Second, latent topic information is additionally explored in the hope to further enhance the proximity-based RM framework. A series of experiments conducted on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models.
AB - Language modeling (LM), aiming to provide a statistical mechanism to associate quantitative scores to sequences of words, has long been an interesting yet challenging problem in the field of speech and language processing. Although the ngram model remains the predominant one, a number of disparate LM methods have been developed to complement the n-gram model. Among them, relevance modeling (RM) that explores the relevance information inherent between the search history and an upcoming word has shown preliminary promise for dynamic language model adaptation. This paper continues this general line of research in two significant aspects. First, the so-called "bag-of- words" assumption of RM is relaxed by incorporating word proximity evidence into the RM formulation. Second, latent topic information is additionally explored in the hope to further enhance the proximity-based RM framework. A series of experiments conducted on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models.
KW - Language model
KW - Latent topic information
KW - Proximity evidence
KW - Relevance model
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=84906253884&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84906253884&partnerID=8YFLogxK
M3 - Conference article
AN - SCOPUS:84906253884
SN - 2308-457X
SP - 2683
EP - 2687
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
T2 - 14th Annual Conference of the International Speech Communication Association, INTERSPEECH 2013
Y2 - 25 August 2013 through 29 August 2013
ER -