MapReduce skyline query processing with partitioning and distributed dominance tests

Jia-Ling Koh, Chia Ching Chen, Chih Yu Chan, Arbee L.P. Chen

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.

Original languageEnglish
Pages (from-to)114-137
Number of pages24
JournalInformation Sciences
Volume375
DOIs
Publication statusPublished - 2017 Jan 1

Fingerprint

Skyline
MapReduce
Query processing
Query Processing
Partitioning
Grid
Response Time
Divides
Experiments
Query
Filter

Keywords

  • Cloud computing
  • MapReduce
  • Parallel processing
  • Skyline query computation

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Cite this

MapReduce skyline query processing with partitioning and distributed dominance tests. / Koh, Jia-Ling; Chen, Chia Ching; Chan, Chih Yu; Chen, Arbee L.P.

In: Information Sciences, Vol. 375, 01.01.2017, p. 114-137.

Research output: Contribution to journalArticle

Koh, Jia-Ling ; Chen, Chia Ching ; Chan, Chih Yu ; Chen, Arbee L.P. / MapReduce skyline query processing with partitioning and distributed dominance tests. In: Information Sciences. 2017 ; Vol. 375. pp. 114-137.
@article{c46af1528044440d84aa5158154ce786,
title = "MapReduce skyline query processing with partitioning and distributed dominance tests",
abstract = "In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.",
keywords = "Cloud computing, MapReduce, Parallel processing, Skyline query computation",
author = "Jia-Ling Koh and Chen, {Chia Ching} and Chan, {Chih Yu} and Chen, {Arbee L.P.}",
year = "2017",
month = "1",
day = "1",
doi = "10.1016/j.ins.2016.09.046",
language = "English",
volume = "375",
pages = "114--137",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",

}

TY - JOUR

T1 - MapReduce skyline query processing with partitioning and distributed dominance tests

AU - Koh, Jia-Ling

AU - Chen, Chia Ching

AU - Chan, Chih Yu

AU - Chen, Arbee L.P.

PY - 2017/1/1

Y1 - 2017/1/1

N2 - In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.

AB - In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.

KW - Cloud computing

KW - MapReduce

KW - Parallel processing

KW - Skyline query computation

UR - http://www.scopus.com/inward/record.url?scp=84989809827&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84989809827&partnerID=8YFLogxK

U2 - 10.1016/j.ins.2016.09.046

DO - 10.1016/j.ins.2016.09.046

M3 - Article

AN - SCOPUS:84989809827

VL - 375

SP - 114

EP - 137

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

ER -