PRINCIPAL COMPONENT ANALYSIS ALGORITHM FOR INTERVAL-VALUED DATA

摘要：

In many application fields, we often confront the problems in which multivariate large-sample data has to be analyzed and processed, namely, the problems of high-dimensional data Analysis and processing. One of the common characteristics of these problems is that the information from many variables overlaps to a certain extent, which leads to the complexities in obtaining the principal part of it. Therefore, dimension reduction or feature extraction should be done in the quantitative analysis on data, and then we could use lesser and independent new variables to represent the most part of information provided by the old ones. Obviously, Principal Component Analysis (PCA) is an ideal tool to satisfy the requirement. However, the traditional Principal Component Analysis Algorithms are designed aiming at numerical data, so they could not be used in the Principal Component Analysis of non-numerical dataset. For the feature extraction on datasets containing interval data, an easily used Principal Component Analysis Algorithm suitable for interval data is provided in this paper. It uses two algorithms for reference: the mature fuzzy clustering analysis for interval data and the easier midpoint and length Principal Component Analysis. In this algorithm, the information from midpoint value and length has been considered, and the calculation is simple, in addition, in order to test the feasibility and validity of the algorithm presented in this paper, the fuzzy clustering analysis is used in the contrast experiment on a real data set.

关键词： Principal component analysis Interval-valued data Feature eztraction

作者: NAXIN CHEN ZHUOMENG ZHANG

作者单位: Department of Applied Mathematics in Dalian Maritime University, Dalian, Liaoning Province, China Jinzhou Hygienic School, Jinzhou 121001, P.R.China

会议类型: 国际会议

会议名称: The Second International Conference on Information & Systems Sciences(ICISS2008)(第二届信息与系统科学国际会议)

会议地点: 大连

会议语种:英文

页码: 496-505

在线出版日期: 2008-12-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

PRINCIPAL COMPONENT ANALYSIS ALGORITHM FOR INTERVAL-VALUED DATA