A relevance detection approach to gene annotation

Wen Juan Hou*, Chih Lee, Kevin Hsin Yih Lin, Hsin Hsi Chen


研究成果: 雜誌貢獻會議論文同行評審

2 引文 斯高帕斯(Scopus)


Gene Ontology (GO) enables scientists to describe and annotate gene products with three controlled vocabularies. However, the nature of variation in terminology makes automatic annotation of gene products based on biomedical literature challenging. In this paper, gene annotation was modeled as relevance detection, and an information retrieval with reference corpus was proposed to annotate a gene product with a GO term given a piece of the evidence text. Gene Reference into Functions (GeneRIFs) in NCBI LocusLink database served as the source of evidence in this study. Evidence text, and GO terms along with their definitions were regarded as queries to a reference corpus, which consists of 525,936 MEDLINE abstracts. The similarity between retrieved results measured the degrees of relationship between evidence text and GO terms, and thus guided the annotation. Different number of predicted GO terms, and different distances between predicted and correct terms in GO hierarchy were considered in this study. The results showed that the best recall rate was 78.2% at distance 12 with 5 predicted GO terms, and the best precision rate was 66.2% at distance 12 with one predicted term, when 200 relevant documents were returned by Okapi information retrieval system.

期刊CEUR Workshop Proceedings
出版狀態已發佈 - 2005
事件1st International Symposium on Semantic Mining in Biomedicine, SMBM 2005 - Hinxton, 英国
持續時間: 2005 4月 102005 4月 13

ASJC Scopus subject areas

  • 一般電腦科學


深入研究「A relevance detection approach to gene annotation」主題。共同形成了獨特的指紋。