TY - GEN
T1 - An efficient approach for mining top-K fault-tolerant repeating patterns
AU - Koh, Jia Ling
AU - Kung, Yu Ting
PY - 2006
Y1 - 2006
N2 - In this paper, an efficient strategy for mining top-K non-trivial faulttolerant repeating patterns (FT-RPs in short) with lengths no less than min_len from data sequences is provided. By extending the idea of appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent the locations where candidate patterns appear in a data sequence with insertion/deletion errors being allowed. Two algorithms, named TFTRP-Mine(Top-K non-trivial FT-RPs Mining) and RE-TFTRP-Mine (REfinement of TFTRP-Mine), respectively, are proposed. Both of these two algorithms use the recursive formulas to obtain the fault-tolerant appearing bit sequence of a pattern systematically and then the fault-tolerant frequency of each candidate pattern could be counted quickly. Besides, RE-TFTRP-Mine adopts two additional strategies for pruning the searching space in order to improve the mining efficiency. The experimental results show that RE-TFTRP-Mine outperforms TFTRP-Mine algorithm when K and min_len are small. In addition, more important and implicit repeating patterns could be found from real music objects by adopting fault tolerant mining.
AB - In this paper, an efficient strategy for mining top-K non-trivial faulttolerant repeating patterns (FT-RPs in short) with lengths no less than min_len from data sequences is provided. By extending the idea of appearing bit sequences, fault-tolerant appearing bit sequences are defined to represent the locations where candidate patterns appear in a data sequence with insertion/deletion errors being allowed. Two algorithms, named TFTRP-Mine(Top-K non-trivial FT-RPs Mining) and RE-TFTRP-Mine (REfinement of TFTRP-Mine), respectively, are proposed. Both of these two algorithms use the recursive formulas to obtain the fault-tolerant appearing bit sequence of a pattern systematically and then the fault-tolerant frequency of each candidate pattern could be counted quickly. Besides, RE-TFTRP-Mine adopts two additional strategies for pruning the searching space in order to improve the mining efficiency. The experimental results show that RE-TFTRP-Mine outperforms TFTRP-Mine algorithm when K and min_len are small. In addition, more important and implicit repeating patterns could be found from real music objects by adopting fault tolerant mining.
UR - http://www.scopus.com/inward/record.url?scp=33745534805&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33745534805&partnerID=8YFLogxK
U2 - 10.1007/11733836_9
DO - 10.1007/11733836_9
M3 - Conference contribution
AN - SCOPUS:33745534805
SN - 3540333371
SN - 9783540333371
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 95
EP - 110
BT - Database Systems for Advanced Applications - 11th International Conference, DASFAA 2006, Proceedings
T2 - 11th International Conference on Database Systems for Advanced Applications, DASFAA 2006
Y2 - 12 April 2006 through 15 April 2006
ER -