Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications

Te Hsuan Li*, Ming Han Lee, Berlin Chen, Lin Shan Lee

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

5 Citations (Scopus)

Abstract

The most attractive form of future network content will be multi-media including speech information, and such speech information usually carries the core concepts for the content. As a result, the spoken documents associated with the multi-media content very possibly can serve as the key for retrieval and browsing. This paper presents a new approach of hierarchical topic organization and visual presentation of spoken documents for such a purpose based on the Probabilistic Latent Semantic Analysis (PLSA). With this approach the spoken documents can be organized into a two-dimensional tree (or multi-layered map) of topic clusters, and the user can very efficiently retrieve or browse the network content or associated spoken documents. Different from the conventional document clustering approaches, with PLSA the relationships among the topic clusters and the appropriate terms as the topic labels can be very well derived. An initial prototype system with Chinese broadcast news as the example spoken documents including automatic generation of titles and summaries and retrieval/browsing functionalities is also presented. Choice of different units other than words to be used as the terms in the processing is also considered in the system based on the special structure of the Chinese language.

Original languageEnglish
Pages625-628
Number of pages4
Publication statusPublished - 2005
Event9th European Conference on Speech Communication and Technology - Lisbon, Portugal
Duration: 2005 Sept 42005 Sept 8

Other

Other9th European Conference on Speech Communication and Technology
Country/TerritoryPortugal
CityLisbon
Period2005/09/042005/09/08

ASJC Scopus subject areas

  • General Engineering

Fingerprint

Dive into the research topics of 'Hierarchical topic organization and visual presentation of spoken documents using Probabilistic Latent Semantic Analysis(PLSA) for efficient retrieval/browsing applications'. Together they form a unique fingerprint.

Cite this