会议专题

A Distributed Frequent Itemsets Mining Algorithm Using Sparse Boolean Matrix on Spark

  Frequent itemsets mining is one of the most important aspects in data mining for finding interesting knowledge in a huge mass of data.However,traditional frequent itemsets mining algorithms are usually data-intensive and computing-intensive.Take Apriori algorithm,a well-known algorithm in finding frequent itemsets for example,it needs to scan the dataset for many times and with the coming of big data era,it will also cost a lot of time over GB-level data.In order to solve those problems,researchers have made great efforts to improve Apriori algorithm based on distributed computing framework Hadoop or Spark.However,the existing parallel Apriori algorithms based on Hadoop or Spark are not efficient enough over GB-level data.In this paper,we proposed a distributed frequent itemsets mining algorithm by sparse boolean matrix on Spark (FISM).And experiments show FISM has better performance than all others existing parallel frequent itemsets mining algorithms and can also deal with GB-level data.

Frequent itemsets mining Apriori algorithm Spark Sparse matrix FISM

Yonghong Luo Zhifan Yang Huike Shi Ying Zhang

College of Computer and Control Engineering,Nankai University,Tianjin,China College of Computer and Control Engineering,Nankai University,Tianjin,China;College of Software,Nank

国际会议

International Asia-Pacific Web Conference(第18届国际亚太互联网大会)

苏州

英文

419-423

2016-09-23(万方平台首次上网日期,不代表论文的发表时间)