MapReduce skyline query processing with partitioning and distributed dominance tests

Jia Ling Koh, Chia Ching Chen, Chih Yu Chan, Arbee L.P. Chen*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

21 Citations (Scopus)

Abstract

In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.

Original languageEnglish
Pages (from-to)114-137
Number of pages24
JournalInformation Sciences
Volume375
DOIs
Publication statusPublished - 2017 Jan 1

Keywords

  • Cloud computing
  • MapReduce
  • Parallel processing
  • Skyline query computation

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'MapReduce skyline query processing with partitioning and distributed dominance tests'. Together they form a unique fingerprint.

Cite this