Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval

Hsin Min Wang, Berlin Chen

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/N-gram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.

Original languageEnglish
Title of host publicationAdvances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings
EditorsShih-Fu Chang, Heung-Yeung Shum, Mark Liao
PublisherSpringer Verlag
Pages606-613
Number of pages8
ISBN (Print)3540426809, 9783540426806
Publication statusPublished - 2001 Jan 1
Event2nd IEEE Pacific-Rim Conference on Multimedia, IEEE-PCM 2001 - Beijing, China
Duration: 2001 Oct 242001 Oct 26

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2195
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other2nd IEEE Pacific-Rim Conference on Multimedia, IEEE-PCM 2001
CountryChina
CityBeijing
Period01/10/2401/10/26

Fingerprint

Vector Space Model
Subword
Document Retrieval
N-gram
Vector spaces
Indexing
Broadcast
Retrieval
Entire
Query
Experiment
Experiments
Narrative
Character
Corpus

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

Wang, H. M., & Chen, B. (2001). Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval. In S-F. Chang, H-Y. Shum, & M. Liao (Eds.), Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings (pp. 606-613). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2195). Springer Verlag.

Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval. / Wang, Hsin Min; Chen, Berlin.

Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings. ed. / Shih-Fu Chang; Heung-Yeung Shum; Mark Liao. Springer Verlag, 2001. p. 606-613 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2195).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wang, HM & Chen, B 2001, Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval. in S-F Chang, H-Y Shum & M Liao (eds), Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 2195, Springer Verlag, pp. 606-613, 2nd IEEE Pacific-Rim Conference on Multimedia, IEEE-PCM 2001, Beijing, China, 01/10/24.
Wang HM, Chen B. Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval. In Chang S-F, Shum H-Y, Liao M, editors, Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings. Springer Verlag. 2001. p. 606-613. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Wang, Hsin Min ; Chen, Berlin. / Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval. Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings. editor / Shih-Fu Chang ; Heung-Yeung Shum ; Mark Liao. Springer Verlag, 2001. pp. 606-613 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{f6bfb07d1f8647f6b4eb1b7827b51ef6,
title = "Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval",
abstract = "In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/N-gram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.",
author = "Wang, {Hsin Min} and Berlin Chen",
year = "2001",
month = "1",
day = "1",
language = "English",
isbn = "3540426809",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "606--613",
editor = "Shih-Fu Chang and Heung-Yeung Shum and Mark Liao",
booktitle = "Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings",

}

TY - GEN

T1 - Comparison of word and subword indexing techniques for mandarin Chinese spoken document retrieval

AU - Wang, Hsin Min

AU - Chen, Berlin

PY - 2001/1/1

Y1 - 2001/1/1

N2 - In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/N-gram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.

AB - In this paper, we investigate the use of words and subwords (including both characters and syllables) in audio indexing for Mandarin Chinese spoken document retrieval. Two retrieval approaches, including the well-known vector space model approach and the newly proposed HMM/N-gram-based approach, are used in the present work. We focus on the use of an entire Chinese textual story (from a newspaper) as a query to retrieve Mandarin Chinese spoken documents (from news broadcasts). Experiments are based on the Topic Detection and Tracking Corpora.

UR - http://www.scopus.com/inward/record.url?scp=84946737365&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84946737365&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84946737365

SN - 3540426809

SN - 9783540426806

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 606

EP - 613

BT - Advances in Multimedia Information Processing - PCM 2001 - 2nd IEEE Pacific Rim Conference on Multimedia, Proceedings

A2 - Chang, Shih-Fu

A2 - Shum, Heung-Yeung

A2 - Liao, Mark

PB - Springer Verlag

ER -