TY - JOUR
T1 - Leveraging topical and positional cues for language modeling in speech recognition
AU - Chiu, Hsuan Sheng
AU - Chen, Kuan Yu
AU - Chen, Berlin
N1 - Funding Information:
Acknowledgments This work was sponsored in part by “Aim for the Top University Plan” of National Taiwan Normal University and Ministry of Education, Taiwan, and the National Science Council, Taiwan, under Grants NSC 101-2221-E-003-024-MY3, NSC 101-2511-S-003-057-MY3, NSC 101-2511-S-003-047-MY3, NSC 99-2221-E-003-017-MY3, and NSC 98-2221-E-003-011-MY3.
PY - 2014/9
Y1 - 2014/9
N2 - This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.
AB - This paper investigates language modeling with topical and positional information for large vocabulary continuous speech recognition. We first compare among a few topic models both theoretically and empirically, including document topic models and word topic models. On the other hand, since for some spoken documents such as broadcast news stories, the composition and the word usage of documents of the same style are usually similar, the documents hence can be separated into partitions consisting of identical rhetoric or topic styles by the literary structures, like introductory remarks, elucidations of methodology or affairs, conclusions of the articles, references or footnotes of reporters, etc. We hence present two position-dependent language models for speech recognition by integrating word positional information into the exiting n-gram and topic models. The experiments conducted on broadcast news transcription seem to indicate that such position-dependent models obtain comparable results to the existing n-gram and topic models.
KW - Language model
KW - Language model adaptation
KW - Positional information
KW - Speech recognition
KW - Topical information
UR - http://www.scopus.com/inward/record.url?scp=84904857190&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=84904857190&partnerID=8YFLogxK
U2 - 10.1007/s11042-013-1456-2
DO - 10.1007/s11042-013-1456-2
M3 - Article
AN - SCOPUS:84904857190
SN - 1380-7501
VL - 72
SP - 1465
EP - 1481
JO - Multimedia Tools and Applications
JF - Multimedia Tools and Applications
IS - 2
ER -