A Data Preprocessing Algorithm Based-on SVM in Data Warehouse

摘要：

As real-world data tends to be incomplete, noisy and inconsistent, data preprocessing is an important issue for both data warehouse and data mining. Besides well-structured data, data warehouse integrates semi-structured data from WWW data source and those exterior file data without structure. This paper presents a preprocessing classification algorithm that is based on SVM-decision tree. The multiple-categories classifier is composed of SVM and binary decision tree and used for data classification in data warehouse. It can reduce the train scale of SVM classifier and improve the training efficiency. The experiment that classify Chinese Web Page, one kinds of semi-structured data, with this algorithm shows that it not only reduces the size of train set but also has very high training efficiency. Its precision and recall are also very good.

关键词： SVM data preprocessing SVM-decision tree data warehouse

作者: Wangjianfen Shichanghong

作者单位: School of Information and Electronic Engineering, Zhejiang University of Science and Technology, Chi The Quartermaster Institute of the General Logistics Department of the P.L.A, China, 100010

会议类型: 国际会议

会议名称: 第八届国际测试技术研讨会(8th International Symposium on Test and Measurement)

会议地点: 重庆

会议语种:英文

页码: 648-650

在线出版日期: 2009-08-01（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Data Preprocessing Algorithm Based-on SVM in Data Warehouse