High-performance parallel location-aware algorithms for approximate string matching on GPUs

Cheng-Hung Lin, Chun Cheng Huang

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement.

Original languageEnglish
Title of host publicationProceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015
PublisherIEEE Computer Society
Pages570-575
Number of pages6
ISBN (Electronic)9780769557854
DOIs
Publication statusPublished - 2016 Jan 15
Event21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015 - Melbourne, Australia
Duration: 2015 Dec 142015 Dec 17

Publication series

NameProceedings of the International Conference on Parallel and Distributed Systems - ICPADS
Volume2016-January
ISSN (Print)1521-9097

Other

Other21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015
CountryAustralia
CityMelbourne
Period15/12/1415/12/17

Fingerprint

Parallel algorithms
Program processors
DNA
Substitution reactions
Throughput
Processing
Graphics processing unit

Keywords

  • Approximate string matching
  • Bit-parallel algorithm
  • Graphic processing units
  • Levenshtein distance
  • Nondeterministic finite automaton
  • Parallel algorithm

ASJC Scopus subject areas

  • Hardware and Architecture

Cite this

Lin, C-H., & Huang, C. C. (2016). High-performance parallel location-aware algorithms for approximate string matching on GPUs. In Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015 (pp. 570-575). [7384340] (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 2016-January). IEEE Computer Society. https://doi.org/10.1109/ICPADS.2015.77

High-performance parallel location-aware algorithms for approximate string matching on GPUs. / Lin, Cheng-Hung; Huang, Chun Cheng.

Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015. IEEE Computer Society, 2016. p. 570-575 7384340 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS; Vol. 2016-January).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Lin, C-H & Huang, CC 2016, High-performance parallel location-aware algorithms for approximate string matching on GPUs. in Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015., 7384340, Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS, vol. 2016-January, IEEE Computer Society, pp. 570-575, 21st IEEE International Conference on Parallel and Distributed Systems, ICPADS 2015, Melbourne, Australia, 15/12/14. https://doi.org/10.1109/ICPADS.2015.77
Lin C-H, Huang CC. High-performance parallel location-aware algorithms for approximate string matching on GPUs. In Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015. IEEE Computer Society. 2016. p. 570-575. 7384340. (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS). https://doi.org/10.1109/ICPADS.2015.77
Lin, Cheng-Hung ; Huang, Chun Cheng. / High-performance parallel location-aware algorithms for approximate string matching on GPUs. Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015. IEEE Computer Society, 2016. pp. 570-575 (Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS).
@inproceedings{61d78249a8c64d36bcf69d25a920ed46,
title = "High-performance parallel location-aware algorithms for approximate string matching on GPUs",
abstract = "Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement.",
keywords = "Approximate string matching, Bit-parallel algorithm, Graphic processing units, Levenshtein distance, Nondeterministic finite automaton, Parallel algorithm",
author = "Cheng-Hung Lin and Huang, {Chun Cheng}",
year = "2016",
month = "1",
day = "15",
doi = "10.1109/ICPADS.2015.77",
language = "English",
series = "Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS",
publisher = "IEEE Computer Society",
pages = "570--575",
booktitle = "Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015",

}

TY - GEN

T1 - High-performance parallel location-aware algorithms for approximate string matching on GPUs

AU - Lin, Cheng-Hung

AU - Huang, Chun Cheng

PY - 2016/1/15

Y1 - 2016/1/15

N2 - Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement.

AB - Approximate string matching has been widely used in many applications, including deoxyribonucleic acid sequence searching, spell checking, text mining, and spam filters. The method is designed to find all locations of strings that approximately match a pattern in accordance with the number of insertion, deletion, and substitution operations. Among the proposed algorithms, the bit-parallel algorithms are considered to be the best and highly efficient algorithms. However, the traditional bit-parallel algorithms lacks the ability of identifying the start and end positions of a matched pattern. Furthermore, acceleration of the bit-parallel algorithms has become a crucial issue for processing big data nowadays. In this paper, we propose two kinds of parallel location-aware algorithms called data-segmented parallelism and high-degree parallelism as means to accelerate approximate string matching using graphic processing units. Experimental results show that the high-degree parallelism on GPUs achieves significant improvement in system and kernel throughputs compared to CPU counterparts. Compared to state-of-the-art approaches, the proposed high-degree parallelism achieves 11 to 105 times improvement.

KW - Approximate string matching

KW - Bit-parallel algorithm

KW - Graphic processing units

KW - Levenshtein distance

KW - Nondeterministic finite automaton

KW - Parallel algorithm

UR - http://www.scopus.com/inward/record.url?scp=84964619916&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84964619916&partnerID=8YFLogxK

U2 - 10.1109/ICPADS.2015.77

DO - 10.1109/ICPADS.2015.77

M3 - Conference contribution

AN - SCOPUS:84964619916

T3 - Proceedings of the International Conference on Parallel and Distributed Systems - ICPADS

SP - 570

EP - 575

BT - Proceedings - 2015 IEEE 21st International Conference on Parallel and Distributed Systems, ICPADS 2015

PB - IEEE Computer Society

ER -