TY - JOUR

T1 - An efficient algorithm for communication set generation of data parallel programs with block-cyclic distribution

AU - Hwang, Gwan Hwan

PY - 2004/4/1

Y1 - 2004/4/1

N2 - Data parallel programming languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. In this paper, we present a new algorithm for computing the communication sets in array section movements with block-cyclic (cyclic (k) in HPF) distribution. Our framework can handle multi-level alignments, multi-dimensional arrays, array intrinsic functions, affine indices and axis exchanges in the array subscript. Instead of employing the linear diophantine equation solver, a new algorithm which does not rely on the linear diophantine equation solver to calculate communication sets is proposed. We use formal proof and experimental results to show that it is more efficient than previous solutions to the same problem. Another important contribution of this paper is that we prove that the compiler is able to compute efficiently the communication sets of block-cyclic distribution as long as the block sizes of the arrays are set to be identical or the lowest common multiple (LCM) of block sizes is not a huge integer. We demonstrate it by thorough complexity analyses and extensive experimental results.

AB - Data parallel programming languages, such as High Performance Fortran, are widely regarded as a promising means for writing portable programs for distributed-memory machines. In this paper, we present a new algorithm for computing the communication sets in array section movements with block-cyclic (cyclic (k) in HPF) distribution. Our framework can handle multi-level alignments, multi-dimensional arrays, array intrinsic functions, affine indices and axis exchanges in the array subscript. Instead of employing the linear diophantine equation solver, a new algorithm which does not rely on the linear diophantine equation solver to calculate communication sets is proposed. We use formal proof and experimental results to show that it is more efficient than previous solutions to the same problem. Another important contribution of this paper is that we prove that the compiler is able to compute efficiently the communication sets of block-cyclic distribution as long as the block sizes of the arrays are set to be identical or the lowest common multiple (LCM) of block sizes is not a huge integer. We demonstrate it by thorough complexity analyses and extensive experimental results.

KW - Block-cyclic distributions

KW - Data parallel programs

KW - Distributed memory machines

KW - HPF compiler

KW - Parallelizing compiler

UR - http://www.scopus.com/inward/record.url?scp=2442694369&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=2442694369&partnerID=8YFLogxK

U2 - 10.1016/j.parco.2004.02.001

DO - 10.1016/j.parco.2004.02.001

M3 - Article

AN - SCOPUS:2442694369

VL - 30

SP - 473

EP - 501

JO - Parallel Computing

JF - Parallel Computing

SN - 0167-8191

IS - 4

ER -