A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

Berlin Chen*, Hsin Min Wang, Lin Shan Lee

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

30 Citations (Scopus)

Abstract

In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

Original languageEnglish
Pages (from-to)128-145
Number of pages18
JournalACM Transactions on Asian Language Information Processing
Volume3
Issue number2
DOIs
Publication statusPublished - 2004 Jun

Keywords

  • Hidden Markov models
  • Mandarin spoken documents
  • Syllable-level Indexing features

ASJC Scopus subject areas

  • General Computer Science

Fingerprint

Dive into the research topics of 'A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents'. Together they form a unique fingerprint.

Cite this