On Top-K Closed Sequential Patterns Mining
Sequential pattern mining is a very significant data mining project.In this area,most of the previous studies require us to provide a support threshold to accomplish the mining.However,in reality providing an appropriate threshold is very difficult if we did not acquaint with some background information relevant to the data.In addition,there exist many useless sequential patterns when the least support is too low.An alternative task is proposed to solve the above problems: mining top-k frequent closed sequences with the least length constraint,that is,mining k most frequent closed sequences whose length are equal or more than min_len.However,most of the previous algorithms are based on the framework of candidate and generation,thus leading too much space usage and running time.To this end,in this paper,we propose a very efficient algorithm named BITSP without candidate and generation for mining top-k frequent closed sequences with the least length.Specifically,we adopt BIDE for frequent closed sequential patterns enumeration.Based on BIDE,we can directly use the closure checking scheme and effectively raise the minimum support threshold without candidate maintenance.In addition,we also propose two novel pruning strategies by exploiting the properties of minimum length constraint.Our extensive performance test with synthetic and real datasets demonstrates that BI-TSP outperforms the baselines in both memory and running time.
Jing Wang Lei Zhang Guiquan Liu Qi Liu Enhong Chen
University of Science and Technology of China
国际会议
厦门
英文
303-308
2014-08-19(万方平台首次上网日期,不代表论文的发表时间)