会议专题

A Data Preprocessing Algorithm Based-on SVM in Data Warehouse

As real-world data tends to be incomplete, noisy and inconsistent, data preprocessing is an important issue for both data warehouse and data mining. Besides well-structured data, data warehouse integrates semi-structured data from WWW data source and those exterior file data without structure. This paper presents a preprocessing classification algorithm that is based on SVM-decision tree. The multiple-categories classifier is composed of SVM and binary decision tree and used for data classification in data warehouse. It can reduce the train scale of SVM classifier and improve the training efficiency. The experiment that classify Chinese Web Page, one kinds of semi-structured data, with this algorithm shows that it not only reduces the size of train set but also has very high training efficiency. Its precision and recall are also very good.

SVM data preprocessing SVM-decision tree data warehouse

Wangjianfen Shichanghong

School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Chi The Quartermaster Institute of the General Logistics Department of the P.L.A, China, 100010

国际会议

第八届国际测试技术研讨会(8th International Symposium on Test and Measurement)

重庆

英文

648-650

2009-08-01(万方平台首次上网日期,不代表论文的发表时间)