A relevance detection approach to gene annotation

Wen Juan Hou, Chih Lee, Kevin Hsin Yih Lin, Hsin Hsi Chen

Research output: Contribution to journalConference article

2 Citations (Scopus)

Abstract

Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2% at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2% at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume148
Publication statusPublished - 2005 Dec 1
Event1st International Symposium on Semantic Mining in Biomedicine, SMBM 2005 - Hinxton, United Kingdom
Duration: 2005 Apr 102005 Apr 13

    Fingerprint

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Hou, W. J., Lee, C., Lin, K. H. Y., & Chen, H. H. (2005). A relevance detection approach to gene annotation. CEUR Workshop Proceedings, 148.