DNA Sequence Similarity Search through Content-Based Retrieval Technique

Chia H. Yeh*, Po Y. Sung, Hsuan T. Chang, Chung J. Kuo

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

Abstract

Deoxyribonucleic acid (DNA) sequences are difficult to analyze similarity due to their length and complexity. The challenge lies in being able to use digital signal processing (DSP) to solve highly relevant problems in DNA sequences. Here, we transfer a one-dimensional (ID) DNA sequence into a two-dimensional (2D) pattern by using the Peano scan algorithm. Four complex values are assigned to the characters "A", "C", "T", and "G", respectively. Then, Fourier transform is employed to obtain far-field amplitude distribution of the 2D pattern. Hereto, a ID DNA sequence becomes a 2D image pattern. Features are extracted from the 2D image pattern with the Principle Component Analysis (PCA) method. Therefore, the DNA sequence database can be established. Unfortunately, comparing features may take a long time when the database is large since multi-dimensional features are often available. This problem is solved by building indexing structure like a filter to filter-out non-relevant items and select a subset of candidate DNA sequences. Clustering algorithms can organize the multi-dimensional feature data into the indexing structure for effective retrieval. Accordingly, the query sequence can be only compared against candidate ones rather than all sequences in database. In fact, our algorithm provides a pre-processing method to accelerate the DNA sequence search process. Finally, experimental results further demonstrate the efficiency of our proposed algorithm for DNA sequences similarity retrieval.

Original languageEnglish
Pages (from-to)635-645
Number of pages11
JournalProceedings of SPIE - The International Society for Optical Engineering
Volume5096
DOIs
Publication statusPublished - 2003
Externally publishedYes
EventPROCEEDINGS OF SPIE SPIE - The International Society for Optical Engineering: Signal Processing, Sensor Fusion, and Target Recognition XII - Orlando, FL, United States
Duration: 2003 Apr 212003 Apr 23

Keywords

  • Clustering algorithm
  • DNA
  • Fourier transformation
  • Indexing structure
  • Peano scan
  • Principle Component Analysis
  • Similarity retrieval

ASJC Scopus subject areas

  • Electronic, Optical and Magnetic Materials
  • Condensed Matter Physics
  • Computer Science Applications
  • Applied Mathematics
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'DNA Sequence Similarity Search through Content-Based Retrieval Technique'. Together they form a unique fingerprint.

Cite this