Improvement of TF-IDF Algorithm Based on Hadoop Framework

摘要：

　　TF-IDF algorithm is often used in search engine,text similarity computation,web data mining,etc.These applications are often faced with the massive data processing.Therefore,how to calculate the tf-idf quickly and efficiently is very important.In this paper,we give a tf-idf algorithm based on the hadoop framework.Experiments show that in the case of massive data computing,the new method applying hadoop framework is more efficient than the traditional methods.

关键词： Hadoop TF-IDF distributed computing

作者: Bin Li Yuan Guoyong

作者单位: Department of Computer Science Colleague of Information Science & Technology Jinan University Guangzhou,China

会议类型: 国际会议

会议名称: 2012 2nd International Conference on Computer Application and System Modeling(2012第二届计算机应用与系统建模国际会议)(ICCASM-2012)

会议地点: 沈阳

会议语种:英文

页码: 391-393

在线出版日期: 2012-07-27（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Improvement of TF-IDF Algorithm Based on Hadoop Framework