Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications

Te Hsuan Li, Ming Han Lee, Berlin Chen, Lin Shan Lee

Research output: Contribution to conferencePaper

5 Citations (Scopus)

Abstract

The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual presentation of spoken documents for such a purpose based on the Probabilistic Latent Semantic Analysis (PLSA). With this approach the spoken documents can be organized into a two-dimensional tree (or multi-layered map) of topic clusters, and the user can very efficiently retrieve or browse the network content or associated spoken documents. Different from the conventional document clustering approaches, with PLSA the relationships among the topic clusters and the appropriate terms as the topic labels can be very well derived. An initial prototype system with Chinese broadcast news as the example spoken documents including automatic generation of titles and summaries and retrieval/browsing functionalities is also presented. Choice of different units other than words to be used as the terms in the processing is also considered in the system based on the special structure of the Chinese language.

Original languageEnglish
Pages625-628
Number of pages4
Publication statusPublished - 2005 Dec 1
Event9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: 2005 Sep 42005 Sep 8

Other

Other9th European Conference on Speech Communication and Technology
CountryPortugal
CityLisbon
Period05/9/405/9/8

Fingerprint

Semantics
Labels
Processing

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Li, T. H., Lee, M. H., Chen, B., & Lee, L. S. (2005). Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications. 625-628. Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.

Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications. / Li, Te Hsuan; Lee, Ming Han; Chen, Berlin; Lee, Lin Shan.

2005. 625-628 Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.

Research output: Contribution to conferencePaper

Li, TH, Lee, MH, Chen, B & Lee, LS 2005, 'Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications', Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 05/9/4 - 05/9/8 pp. 625-628.
Li TH, Lee MH, Chen B, Lee LS. Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications. 2005. Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.
Li, Te Hsuan ; Lee, Ming Han ; Chen, Berlin ; Lee, Lin Shan. / Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications. Paper presented at 9th European Conference on Speech Communication and Technology, Lisbon, Portugal.4 p.
@conference{92006db1bdc14f1da6746bf78300d23b,
title = "Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications",
abstract = "The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual presentation of spoken documents for such a purpose based on the Probabilistic Latent Semantic Analysis (PLSA). With this approach the spoken documents can be organized into a two-dimensional tree (or multi-layered map) of topic clusters, and the user can very efficiently retrieve or browse the network content or associated spoken documents. Different from the conventional document clustering approaches, with PLSA the relationships among the topic clusters and the appropriate terms as the topic labels can be very well derived. An initial prototype system with Chinese broadcast news as the example spoken documents including automatic generation of titles and summaries and retrieval/browsing functionalities is also presented. Choice of different units other than words to be used as the terms in the processing is also considered in the system based on the special structure of the Chinese language.",
author = "Li, {Te Hsuan} and Lee, {Ming Han} and Berlin Chen and Lee, {Lin Shan}",
year = "2005",
month = "12",
day = "1",
language = "English",
pages = "625--628",
note = "9th European Conference on Speech Communication and Technology ; Conference date: 04-09-2005 Through 08-09-2005",

}

TY - CONF

T1 - Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications

AU - Li, Te Hsuan

AU - Lee, Ming Han

AU - Chen, Berlin

AU - Lee, Lin Shan

PY - 2005/12/1

Y1 - 2005/12/1

N2 - The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual presentation of spoken documents for such a purpose based on the Probabilistic Latent Semantic Analysis (PLSA). With this approach the spoken documents can be organized into a two-dimensional tree (or multi-layered map) of topic clusters, and the user can very efficiently retrieve or browse the network content or associated spoken documents. Different from the conventional document clustering approaches, with PLSA the relationships among the topic clusters and the appropriate terms as the topic labels can be very well derived. An initial prototype system with Chinese broadcast news as the example spoken documents including automatic generation of titles and summaries and retrieval/browsing functionalities is also presented. Choice of different units other than words to be used as the terms in the processing is also considered in the system based on the special structure of the Chinese language.

AB - The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual presentation of spoken documents for such a purpose based on the Probabilistic Latent Semantic Analysis (PLSA). With this approach the spoken documents can be organized into a two-dimensional tree (or multi-layered map) of topic clusters, and the user can very efficiently retrieve or browse the network content or associated spoken documents. Different from the conventional document clustering approaches, with PLSA the relationships among the topic clusters and the appropriate terms as the topic labels can be very well derived. An initial prototype system with Chinese broadcast news as the example spoken documents including automatic generation of titles and summaries and retrieval/browsing functionalities is also presented. Choice of different units other than words to be used as the terms in the processing is also considered in the system based on the special structure of the Chinese language.

UR - http://www.scopus.com/inward/record.url?scp=33745206738&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33745206738&partnerID=8YFLogxK

M3 - Paper

AN - SCOPUS:33745206738

SP - 625

EP - 628

ER -