A tree-based approach for efficiently mining Approximate Frequent Itemsets

Jia-Ling Koh, Vi Lang Tu

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-Iike structure. We define a recursive function for getting the set of transactions which faulttolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.

Original languageEnglish
Title of host publication2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010
Pages25-36
Number of pages12
Publication statusPublished - 2010 Oct 7
Event2010 4th International Conference on Research Challenges in Information Science, RCIS 2010 - Nice, France
Duration: 2010 May 192010 May 21

Publication series

Name2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010

Other

Other2010 4th International Conference on Research Challenges in Information Science, RCIS 2010
CountryFrance
CityNice
Period10/5/1910/5/21

Fingerprint

Recursive functions
Trees (mathematics)
Association rules
Fault
Data base
Process mining
Problem solving

Keywords

  • Approximate frequent itemset mining
  • FP-tree structure

ASJC Scopus subject areas

  • Information Systems
  • Information Systems and Management

Cite this

Koh, J-L., & Tu, V. L. (2010). A tree-based approach for efficiently mining Approximate Frequent Itemsets. In 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010 (pp. 25-36). [5507360] (2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010).

A tree-based approach for efficiently mining Approximate Frequent Itemsets. / Koh, Jia-Ling; Tu, Vi Lang.

2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010. 2010. p. 25-36 5507360 (2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Koh, J-L & Tu, VL 2010, A tree-based approach for efficiently mining Approximate Frequent Itemsets. in 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010., 5507360, 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010, pp. 25-36, 2010 4th International Conference on Research Challenges in Information Science, RCIS 2010, Nice, France, 10/5/19.
Koh J-L, Tu VL. A tree-based approach for efficiently mining Approximate Frequent Itemsets. In 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010. 2010. p. 25-36. 5507360. (2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010).
Koh, Jia-Ling ; Tu, Vi Lang. / A tree-based approach for efficiently mining Approximate Frequent Itemsets. 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010. 2010. pp. 25-36 (2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010).
@inproceedings{a20feba484ba49b3a3224f2eff25fbbf,
title = "A tree-based approach for efficiently mining Approximate Frequent Itemsets",
abstract = "The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-Iike structure. We define a recursive function for getting the set of transactions which faulttolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.",
keywords = "Approximate frequent itemset mining, FP-tree structure",
author = "Jia-Ling Koh and Tu, {Vi Lang}",
year = "2010",
month = "10",
day = "7",
language = "English",
isbn = "9781424448401",
series = "2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010",
pages = "25--36",
booktitle = "2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010",

}

TY - GEN

T1 - A tree-based approach for efficiently mining Approximate Frequent Itemsets

AU - Koh, Jia-Ling

AU - Tu, Vi Lang

PY - 2010/10/7

Y1 - 2010/10/7

N2 - The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-Iike structure. We define a recursive function for getting the set of transactions which faulttolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.

AB - The strategies for mining frequent itemsets, which is the essential part of discovering association rules, have been widely studied over the last decade. In real-world datasets, it is possible to discover multiple fragmented patterns but miss the longer true patterns due to random noise and errors in the data. Therefore, a number of methods have been proposed recently to discover approximate frequent itemsets. However, a challenge of providing an efficient algorithm for solving this problem is how to avoid costly candidate generation and test. In this paper, an algorithm, named FP-AFI (FP-tree based Approximate Frequent Itemsets mining algorithm), is developed to discover approximate frequent itemsets from a FP-tree-Iike structure. We define a recursive function for getting the set of transactions which faulttolerant contain an itemset P. The patterns in the fault-tolerant supporting transactions of P are represented by the conditional AFP-trees of P. Moreover, to avoid re-constructing the tree structure in the mining process, two pseudo-projection operations on AFP-trees are provided to obtain the conditional AFP-trees of a candidate itemset systematically. Consequently, the approximate support of a candidate itemset and the item supports of each item in the candidate are obtained easily from the conditional AFP-trees. Hence, the constrain test of a candidate itemset is performed efficiently without additional database scan. The experimental results show that the FP-AFI algorithm performs much better than the FP-Apriori and the AFI algorithms in efficiency especially when the size of data set is large and the minimum threshold of approximate support is small. Moreover, the execution time of FP-AFI is scalable even when the error threshold parameters become large.

KW - Approximate frequent itemset mining

KW - FP-tree structure

UR - http://www.scopus.com/inward/record.url?scp=77957319880&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77957319880&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:77957319880

SN - 9781424448401

T3 - 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010

SP - 25

EP - 36

BT - 2010 4th International Conference on Research Challenges in Information Science - Proceedings, RCIS 2010

ER -