A Distributed Frequent Itemsets Mining Algorithm Using Sparse Boolean Matrix on Spark

摘要：

　　Frequent itemsets mining is one of the most important aspects in data mining for finding interesting knowledge in a huge mass of data.However,traditional frequent itemsets mining algorithms are usually data-intensive and computing-intensive.Take Apriori algorithm,a well-known algorithm in finding frequent itemsets for example,it needs to scan the dataset for many times and with the coming of big data era,it will also cost a lot of time over GB-level data.In order to solve those problems,researchers have made great efforts to improve Apriori algorithm based on distributed computing framework Hadoop or Spark.However,the existing parallel Apriori algorithms based on Hadoop or Spark are not efficient enough over GB-level data.In this paper,we proposed a distributed frequent itemsets mining algorithm by sparse boolean matrix on Spark (FISM).And experiments show FISM has better performance than all others existing parallel frequent itemsets mining algorithms and can also deal with GB-level data.

关键词： Frequent itemsets mining Apriori algorithm Spark Sparse matrix FISM

作者: Yonghong Luo Zhifan Yang Huike Shi Ying Zhang

作者单位: College of Computer and Control Engineering,Nankai University,Tianjin,China College of Computer and Control Engineering,Nankai University,Tianjin,China;College of Software,Nank

会议类型: 国际会议

会议名称: International Asia-Pacific Web Conference(第18届国际亚太互联网大会)

会议地点: 苏州

会议语种:英文

页码: 419-423

在线出版日期: 2016-09-23（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Distributed Frequent Itemsets Mining Algorithm Using Sparse Boolean Matrix on Spark