A Distributed Frequent Itemsets Mining Algorithm Using Sparse Boolean Matrix on Spark
Frequent itemsets mining is one of the most important aspects in data mining for finding interesting knowledge in a huge mass of data.However,traditional frequent itemsets mining algorithms are usually data-intensive and computing-intensive.Take Apriori algorithm,a well-known algorithm in finding frequent itemsets for example,it needs to scan the dataset for many times and with the coming of big data era,it will also cost a lot of time over GB-level data.In order to solve those problems,researchers have made great efforts to improve Apriori algorithm based on distributed computing framework Hadoop or Spark.However,the existing parallel Apriori algorithms based on Hadoop or Spark are not efficient enough over GB-level data.In this paper,we proposed a distributed frequent itemsets mining algorithm by sparse boolean matrix on Spark (FISM).And experiments show FISM has better performance than all others existing parallel frequent itemsets mining algorithms and can also deal with GB-level data.
Frequent itemsets mining Apriori algorithm Spark Sparse matrix FISM
Yonghong Luo Zhifan Yang Huike Shi Ying Zhang
College of Computer and Control Engineering,Nankai University,Tianjin,China College of Computer and Control Engineering,Nankai University,Tianjin,China;College of Software,Nank
国际会议
International Asia-Pacific Web Conference(第18届国际亚太互联网大会)
苏州
英文
419-423
2016-09-23(万方平台首次上网日期,不代表论文的发表时间)