Finding Correlated item pairs through efficient pruning with a given threshold
Given a minimum threshold in a massive market-basket data set, an item pair whose correlation above the threshold is considered correlated. In this paper, we provide a randomized algorithm SERIT-a Searching-corrElated-pair Randomized algorithm for dIfferent Thresholds -to find all correlated pairs effectively, which adopts the Pearson’s correlation coefficient 11 as the measure criterion. In their CIKM’06 paper 2, Zhang et al. Address the same problem by taking the relation of Pearson’s coefficient and Jaccard distance into account. However, it is inefficient when the threshold is small. We propose a new probability function to prune uncorrelated item pairs based on 2 ,which can cover the shortage of the former one. Experimental results with synthetic and real data sets reveal that with a given threshold, even if it is small, SERIT algorithm can prune the item pairs unwanted efficiently and save large computational resources.
Bo Wang Liang Su Aiping Li Peng Zou
School of Computer,National University of Defense Technology,China
国际会议
The Ninth International Conference on Web-Age Information Management(第九届web时代信息管理国际会议)(WAIM 2008)
张家界
英文
2008-07-20(万方平台首次上网日期,不代表论文的发表时间)