A relevance detection approach to gene annotation

Wen Juan Hou, Chih Lee, Kevin Hsin Yih Lin, Hsin Hsi Chen

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2% at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2% at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.

Original languageEnglish
JournalCEUR Workshop Proceedings
Volume148
Publication statusPublished - 2005
Externally publishedYes

Fingerprint

Genes
Ontology
Thesauri
Information retrieval systems
Terminology
Information retrieval

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Hou, W. J., Lee, C., Lin, K. H. Y., & Chen, H. H. (2005). A relevance detection approach to gene annotation. CEUR Workshop Proceedings, 148.

A relevance detection approach to gene annotation. / Hou, Wen Juan; Lee, Chih; Lin, Kevin Hsin Yih; Chen, Hsin Hsi.

In: CEUR Workshop Proceedings, Vol. 148, 2005.

Research output: Contribution to journalArticle

Hou, Wen Juan ; Lee, Chih ; Lin, Kevin Hsin Yih ; Chen, Hsin Hsi. / A relevance detection approach to gene annotation. In: CEUR Workshop Proceedings. 2005 ; Vol. 148.
@article{c13c681a811c4560b116712cb23cdca4,
title = "A relevance detection approach to gene annotation",
abstract = "Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2{\%} at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2{\%} at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.",
author = "Hou, {Wen Juan} and Chih Lee and Lin, {Kevin Hsin Yih} and Chen, {Hsin Hsi}",
year = "2005",
language = "English",
volume = "148",
journal = "CEUR Workshop Proceedings",
issn = "1613-0073",
publisher = "CEUR-WS",

}

TY - JOUR

T1 - A relevance detection approach to gene annotation

AU - Hou, Wen Juan

AU - Lee, Chih

AU - Lin, Kevin Hsin Yih

AU - Chen, Hsin Hsi

PY - 2005

Y1 - 2005

N2 - Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2% at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2% at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.

AB - Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2% at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2% at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.

UR - http://www.scopus.com/inward/record.url?scp=84874313174&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874313174&partnerID=8YFLogxK

M3 - Article

VL - 148

JO - CEUR Workshop Proceedings

JF - CEUR Workshop Proceedings

SN - 1613-0073

ER -