Integrating log-based and text-based methods towards automatic Web thesaurus construction

Hsiao-Tieh Pu, Lee Feng Chien

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

This paper presents an approach to investigating the possibility for constructing an automatic and scalable thesaurus based on Web users' vocabularies with search interests. The proposed approach mainly includes two techniques, namely, relevant term extraction and concept clustering. The former combines query-session-based and text-based methods to extract relevant terms for a given search term; and the latter organizes these relevant terms into concept classes based on the search results from search engines. Some initial experiments have been conducted to test feasibility of the proposed approach to organizing Web users' vocabularies. The obtained results show that relevant terms could be extracted efficiently and concept classes be more well organized. The approach has a great potential to benefit the automatic construction of a large scale thesaurus for future Web IR applications.

Original languageEnglish
Pages (from-to)463-471
Number of pages9
JournalProceedings of the ASIST Annual Meeting
Volume41
DOIs
Publication statusPublished - 2004 Nov 1

Fingerprint

class concept
Thesauri
thesaurus
vocabulary
Search engines
search engine
experiment
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Integrating log-based and text-based methods towards automatic Web thesaurus construction. / Pu, Hsiao-Tieh; Chien, Lee Feng.

In: Proceedings of the ASIST Annual Meeting, Vol. 41, 01.11.2004, p. 463-471.

Research output: Contribution to journalArticle

@article{90570ac53d8b4ab68057f6ca6b51895d,
title = "Integrating log-based and text-based methods towards automatic Web thesaurus construction",
abstract = "This paper presents an approach to investigating the possibility for constructing an automatic and scalable thesaurus based on Web users' vocabularies with search interests. The proposed approach mainly includes two techniques, namely, relevant term extraction and concept clustering. The former combines query-session-based and text-based methods to extract relevant terms for a given search term; and the latter organizes these relevant terms into concept classes based on the search results from search engines. Some initial experiments have been conducted to test feasibility of the proposed approach to organizing Web users' vocabularies. The obtained results show that relevant terms could be extracted efficiently and concept classes be more well organized. The approach has a great potential to benefit the automatic construction of a large scale thesaurus for future Web IR applications.",
author = "Hsiao-Tieh Pu and Chien, {Lee Feng}",
year = "2004",
month = "11",
day = "1",
doi = "10.1002/meet.1450410154",
language = "English",
volume = "41",
pages = "463--471",
journal = "Proceedings of the ASIST Annual Meeting",
issn = "1550-8390",
publisher = "Learned Information",

}

TY - JOUR

T1 - Integrating log-based and text-based methods towards automatic Web thesaurus construction

AU - Pu, Hsiao-Tieh

AU - Chien, Lee Feng

PY - 2004/11/1

Y1 - 2004/11/1

N2 - This paper presents an approach to investigating the possibility for constructing an automatic and scalable thesaurus based on Web users' vocabularies with search interests. The proposed approach mainly includes two techniques, namely, relevant term extraction and concept clustering. The former combines query-session-based and text-based methods to extract relevant terms for a given search term; and the latter organizes these relevant terms into concept classes based on the search results from search engines. Some initial experiments have been conducted to test feasibility of the proposed approach to organizing Web users' vocabularies. The obtained results show that relevant terms could be extracted efficiently and concept classes be more well organized. The approach has a great potential to benefit the automatic construction of a large scale thesaurus for future Web IR applications.

AB - This paper presents an approach to investigating the possibility for constructing an automatic and scalable thesaurus based on Web users' vocabularies with search interests. The proposed approach mainly includes two techniques, namely, relevant term extraction and concept clustering. The former combines query-session-based and text-based methods to extract relevant terms for a given search term; and the latter organizes these relevant terms into concept classes based on the search results from search engines. Some initial experiments have been conducted to test feasibility of the proposed approach to organizing Web users' vocabularies. The obtained results show that relevant terms could be extracted efficiently and concept classes be more well organized. The approach has a great potential to benefit the automatic construction of a large scale thesaurus for future Web IR applications.

UR - http://www.scopus.com/inward/record.url?scp=34247400218&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34247400218&partnerID=8YFLogxK

U2 - 10.1002/meet.1450410154

DO - 10.1002/meet.1450410154

M3 - Article

AN - SCOPUS:34247400218

VL - 41

SP - 463

EP - 471

JO - Proceedings of the ASIST Annual Meeting

JF - Proceedings of the ASIST Annual Meeting

SN - 1550-8390

ER -