An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval

Berlin Chen, Hsin Min Wang, Lin Shan Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

In this paper an HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval is presented. The underlying characteristics and different structures of this approach were extensively investigated. The retrieval capabilities were verified by tests with indexing features of word-And syllable(subword)-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word-And syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

Original languageEnglish
Title of host publicationEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology
EditorsBorge Lindberg, Henrik Benner, Paul Dalsgaard, Zheng-Hua Tan
PublisherInternational Speech Communication Association
Pages1045-1048
Number of pages4
ISBN (Electronic)8790834100, 9788790834104
Publication statusPublished - 2001 Jan 1
Event7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001 - Aalborg, Denmark
Duration: 2001 Sep 32001 Sep 7

Publication series

NameEUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

Other

Other7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001
CountryDenmark
CityAalborg
Period01/9/301/9/7

Fingerprint

Information fusion
Vector spaces
indexing
Linguistics
linguistics
Processing
discrimination
Experiments
experiment
performance

ASJC Scopus subject areas

  • Communication
  • Linguistics and Language
  • Computer Science Applications
  • Software

Cite this

Chen, B., Wang, H. M., & Lee, L. S. (2001). An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. In B. Lindberg, H. Benner, P. Dalsgaard, & Z-H. Tan (Eds.), EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology (pp. 1045-1048). (EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology). International Speech Communication Association.

An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. / Chen, Berlin; Wang, Hsin Min; Lee, Lin Shan.

EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. ed. / Borge Lindberg; Henrik Benner; Paul Dalsgaard; Zheng-Hua Tan. International Speech Communication Association, 2001. p. 1045-1048 (EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Chen, B, Wang, HM & Lee, LS 2001, An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. in B Lindberg, H Benner, P Dalsgaard & Z-H Tan (eds), EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology, International Speech Communication Association, pp. 1045-1048, 7th European Conference on Speech Communication and Technology - Scandinavia, EUROSPEECH 2001, Aalborg, Denmark, 01/9/3.
Chen B, Wang HM, Lee LS. An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. In Lindberg B, Benner H, Dalsgaard P, Tan Z-H, editors, EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. International Speech Communication Association. 2001. p. 1045-1048. (EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology).
Chen, Berlin ; Wang, Hsin Min ; Lee, Lin Shan. / An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval. EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology. editor / Borge Lindberg ; Henrik Benner ; Paul Dalsgaard ; Zheng-Hua Tan. International Speech Communication Association, 2001. pp. 1045-1048 (EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology).
@inproceedings{e774f041329d4668bd1bec86671ad723,
title = "An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval",
abstract = "In this paper an HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval is presented. The underlying characteristics and different structures of this approach were extensively investigated. The retrieval capabilities were verified by tests with indexing features of word-And syllable(subword)-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word-And syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.",
author = "Berlin Chen and Wang, {Hsin Min} and Lee, {Lin Shan}",
year = "2001",
month = "1",
day = "1",
language = "English",
series = "EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology",
publisher = "International Speech Communication Association",
pages = "1045--1048",
editor = "Borge Lindberg and Henrik Benner and Paul Dalsgaard and Zheng-Hua Tan",
booktitle = "EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology",

}

TY - GEN

T1 - An HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval

AU - Chen, Berlin

AU - Wang, Hsin Min

AU - Lee, Lin Shan

PY - 2001/1/1

Y1 - 2001/1/1

N2 - In this paper an HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval is presented. The underlying characteristics and different structures of this approach were extensively investigated. The retrieval capabilities were verified by tests with indexing features of word-And syllable(subword)-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word-And syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

AB - In this paper an HMM/N-gram-based linguistic processing approach for Mandarin spoken document retrieval is presented. The underlying characteristics and different structures of this approach were extensively investigated. The retrieval capabilities were verified by tests with indexing features of word-And syllable(subword)-levels and comparison with the conventional vector space model approach. To further improve the discrimination capabilities of the HMMs, both the expectation-maximization (EM) and minimum classification error (MCE) training algorithms were introduced in training. The information fusion of indexing features of word-And syllable-levels was also investigated. The spoken document retrieval experiments were performed on the Topic Detection and Tracking Corpora (TDT-2 and TDT-3). Very encouraging retrieval performance was obtained.

UR - http://www.scopus.com/inward/record.url?scp=85009101169&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85009101169&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85009101169

T3 - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

SP - 1045

EP - 1048

BT - EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology

A2 - Lindberg, Borge

A2 - Benner, Henrik

A2 - Dalsgaard, Paul

A2 - Tan, Zheng-Hua

PB - International Speech Communication Association

ER -