Improvement of TF-IDF Algorithm Based on Hadoop Framework
TF-IDF algorithm is often used in search engine,text similarity computation,web data mining,etc.These applications are often faced with the massive data processing.Therefore,how to calculate the tf-idf quickly and efficiently is very important.In this paper,we give a tf-idf algorithm based on the hadoop framework.Experiments show that in the case of massive data computing,the new method applying hadoop framework is more efficient than the traditional methods.
Hadoop TF-IDF distributed computing
Bin Li Yuan Guoyong
Department of Computer Science Colleague of Information Science & Technology Jinan University Guangzhou,China
国际会议
沈阳
英文
391-393
2012-07-27(万方平台首次上网日期,不代表论文的发表时间)