Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features

Chun Jen Wang, Berlin Chen, Lin Shan Lee

研究成果: 會議貢獻類型

5 引文 斯高帕斯(Scopus)

摘要

Different models retrieve the documents based on different approaches of extracting the underlying content. Different levels of indexing features also offer different functionalities and discriminabilities when retrieving the documents. In this paper, we present results for Chinese spoken document retrieval with hybrid models to integrate the knowledge obtainable from three basic retrieval models, namely, the standard vector space model (VSM), the hidden Markov model (HMM), and the latent semantic indexing (LSI) model. The characteristics of retrieval performance using both word-level and syllable-level indexing features were extensively explored. In addition, a data-driven approach to derive variable-length indexing features is also presented. Very satisfactory performance can be achieved with these data-driven features while retaining very compact feature set size. Experiments showed that this approach has the potential to identify domain-specific terminologies or newlygenerated phrases. It is therefore very useful not only in Chinese document retrieval, but also in detecting out of vocabulary (OOV) words in Chinese. Very encouraging results were obtained when the hybrid models were used with the datadriven indexing features as well.

原文英語
頁面1985-1988
頁數4
出版狀態已發佈 - 2002 一月 1
事件7th International Conference on Spoken Language Processing, ICSLP 2002 - Denver, 美国
持續時間: 2002 九月 162002 九月 20

其他

其他7th International Conference on Spoken Language Processing, ICSLP 2002
國家美国
城市Denver
期間02/9/1602/9/20

    指紋

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

引用此

Wang, C. J., Chen, B., & Lee, L. S. (2002). Improved Chinese spoken document retrieval with hybrid modeling and data-driven indexing features. 1985-1988. 論文發表於 7th International Conference on Spoken Language Processing, ICSLP 2002, Denver, 美国.