会议专题

Web Text Categorization for Large-scale Corpus

Corpus is the set of language materials which are stored in computers and can use computers to search, query and analyze for enterprise decision-makers. Automated text categorization has been extensively studied and various techniques for document categorization. But based on the current scarcity of Chinese corpus, especially in the field of text categorization, the Chinese categorization corpus is especially rare; Besides, most of these experimental prototypes, for the purpose of evaluating different techniques, have been restricted to the heterogeneous, autonomic, dynamic and distributed internet environment This paper proposes and realizes a kind of incremental learning algorithm on large-scale corpus for Chinese text categorization. In this study, an approach based on Support Vector Machines (SVMs) for web text mining of large-scale systems on GBODSS is developed to support enterprise decision making. Experimental results show that our approach has good classification accuracy by incremental learning and it shows speed up of computation time is almost super linear.

grid technology GBODSS large-scale corpus Chinese text categorization

Zhijuan Jia Jianbo Mu

School of Computer Science and Technology Wuhan University of Technology Wuhan, China Institute of Software Science Zhengzhou Normal University ZhengZhou, China

国际会议

The 2010 International Conference on Computer Application and System Modeling(2010计算机应用与系统建模国际会议 ICCASM 2010)

太原

英文

188-191

2010-10-22(万方平台首次上网日期,不代表论文的发表时间)