A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

Berlin Chen, Hsin Min Wang, Lin Shan Lee

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

Original languageEnglish
Pages (from-to)128-145
Number of pages18
JournalACM Transactions on Asian Language Information Processing
Volume3
Issue number2
DOIs
Publication statusPublished - 2004 Jun 1

Fingerprint

Vector spaces
Information retrieval
Fusion reactions
Experiments

Keywords

  • Hidden Markov models
  • Mandarin spoken documents
  • Syllable-level Indexing features

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents. / Chen, Berlin; Wang, Hsin Min; Lee, Lin Shan.

In: ACM Transactions on Asian Language Information Processing, Vol. 3, No. 2, 01.06.2004, p. 128-145.

Research output: Contribution to journalArticle

@article{8952ad8106384c36836073942f985a19,
title = "A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents",
abstract = "In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.",
keywords = "Hidden Markov models, Mandarin spoken documents, Syllable-level Indexing features",
author = "Berlin Chen and Wang, {Hsin Min} and Lee, {Lin Shan}",
year = "2004",
month = "6",
day = "1",
doi = "10.1145/1034780.1034784",
language = "English",
volume = "3",
pages = "128--145",
journal = "ACM Transactions on Asian Language Information Processing",
issn = "1530-0226",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - A discriminative HMM/N-gram-based retrieval approach for Mandarin spoken documents

AU - Chen, Berlin

AU - Wang, Hsin Min

AU - Lee, Lin Shan

PY - 2004/6/1

Y1 - 2004/6/1

N2 - In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

AB - In recent years, statistical modeling approaches have steadily gained in popularity in the field of information retrieval. This article presents an HMM/N-gram-based retrieval approach for Mandarin spoken documents. The underlying characteristics and the various structures of this approach were extensively investigated and analyzed. The retrieval capabilities were verified by tests with word- and syllable-level indexing features and comparisons to the conventional vector-space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. Fusion of information via indexing word- and syllable-level features was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

KW - Hidden Markov models

KW - Mandarin spoken documents

KW - Syllable-level Indexing features

UR - http://www.scopus.com/inward/record.url?scp=10044234199&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=10044234199&partnerID=8YFLogxK

U2 - 10.1145/1034780.1034784

DO - 10.1145/1034780.1034784

M3 - Article

VL - 3

SP - 128

EP - 145

JO - ACM Transactions on Asian Language Information Processing

JF - ACM Transactions on Asian Language Information Processing

SN - 1530-0226

IS - 2

ER -