Efficient and portable distribution modeling for large-scale scientific data processing with data-parallel primitives

Hao Yi Yang, Zhi Rong Lin, Ko Chih Wang*

*此作品的通信作者

研究成果: 雜誌貢獻期刊論文同行評審

摘要

The use of distribution-based data representation to handle large-scale scientific datasets is a promising approach. Distribution-based approaches often transform a scientific dataset into many distributions, each of which is calculated from a small number of samples. Most of the proposed parallel algorithms focus on modeling single distributions from many input samples efficiently, but these may not fit the large-scale scientific data processing scenario because they cannot utilize computing resources effectively. Histograms and the Gaussian Mixture Model (GMM) are the most popular distribution representations used to model scientific datasets. Therefore, we propose the use of multi-set histogram and GMM modeling algorithms for the scenario of large-scale scientific data processing. Our algorithms are developed by data-parallel primitives to achieve portability across different hardware architectures. We evaluate the performance of the proposed algorithms in detail and demonstrate use cases for scientific data processing.

原文英語
文章編號285
期刊Algorithms
14
發行號10
DOIs
出版狀態已發佈 - 2021 10月

ASJC Scopus subject areas

  • 理論電腦科學
  • 數值分析
  • 計算機理論與數學
  • 計算數學

指紋

深入研究「Efficient and portable distribution modeling for large-scale scientific data processing with data-parallel primitives」主題。共同形成了獨特的指紋。

引用此