A tree-based approach for efficiently mining Approximate Frequent Itemsets

Lia Ling Koh*, Vi Lang Tu

*此作品的通信作者

研究成果: 書貢獻/報告類型會議論文篇章

3 引文 斯高帕斯(Scopus)

摘要

The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-Iike structure. We define a recursive function for getting the set of transactions which faulttolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.

原文英語
主出版物標題2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010
發行者IEEE Computer Society
頁面25-36
頁數12
ISBN(列印)9781424448401
DOIs
出版狀態已發佈 - 2010
事件2010 4th International Conference on Research Challenges in Information Science, RCIS 2010 - Nice, 法国
持續時間: 2010 5月 192010 5月 21

出版系列

名字2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010

其他

其他2010 4th International Conference on Research Challenges in Information Science, RCIS 2010
國家/地區法国
城市Nice
期間2010/05/192010/05/21

ASJC Scopus subject areas

  • 資訊系統
  • 資訊系統與管理

指紋

深入研究「A tree-based approach for efficiently mining Approximate Frequent Itemsets」主題。共同形成了獨特的指紋。

引用此