Abstract
In this paper, in order to efficiently process skyline queries by the MapReduce framework, two algorithms are proposed to prevent the bottleneck of centrally finding the global skyline from the local skylines. The proposed algorithms aim to reduce the number of dominance tests, which check whether a data point is dominated by another data point, and perform the necessary dominance tests in parallel. The first algorithm uses a grid-based and an angle-based partitioning schemes to divide the data space into segments for finding the local skyline data points. Two sets of rules are designed respectively for the two partitioning methods to reduce the number of dominance tests among the local skyline data points to find the skyline data points. The second algorithm uses the skyline data points discovered from sample data points to filter out most non-skyline data points in the mappers. For the remaining data points, the dominance relationship between the grid-partitioning segments is used to further reduce the number of dominance tests performed in both the mapper and the reducer. The experiment results show that the proposed two algorithms have significant improvement on response time compared with the related works.
Original language | English |
---|---|
Pages (from-to) | 114-137 |
Number of pages | 24 |
Journal | Information Sciences |
Volume | 375 |
DOIs | |
Publication status | Published - 2017 Jan 1 |
Keywords
- Cloud computing
- MapReduce
- Parallel processing
- Skyline query computation
ASJC Scopus subject areas
- Software
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence