PRINCIPAL COMPONENT ANALYSIS ALGORITHM FOR INTERVAL-VALUED DATA
In many application fields, we often confront the problems in which multivariate large-sample data has to be analyzed and processed, namely, the problems of high-dimensional data Analysis and processing. One of the common characteristics of these problems is that the information from many variables overlaps to a certain extent, which leads to the complexities in obtaining the principal part of it. Therefore, dimension reduction or feature extraction should be done in the quantitative analysis on data, and then we could use lesser and independent new variables to represent the most part of information provided by the old ones. Obviously, Principal Component Analysis (PCA) is an ideal tool to satisfy the requirement. However, the traditional Principal Component Analysis Algorithms are designed aiming at numerical data, so they could not be used in the Principal Component Analysis of non-numerical dataset. For the feature extraction on datasets containing interval data, an easily used Principal Component Analysis Algorithm suitable for interval data is provided in this paper. It uses two algorithms for reference: the mature fuzzy clustering analysis for interval data and the easier midpoint and length Principal Component Analysis. In this algorithm, the information from midpoint value and length has been considered, and the calculation is simple, in addition, in order to test the feasibility and validity of the algorithm presented in this paper, the fuzzy clustering analysis is used in the contrast experiment on a real data set.
Principal component analysis Interval-valued data Feature eztraction
NAXIN CHEN ZHUOMENG ZHANG
Department of Applied Mathematics in Dalian Maritime University, Dalian, Liaoning Province, China Jinzhou Hygienic School, Jinzhou 121001, P.R.China
国际会议
The Second International Conference on Information & Systems Sciences(ICISS2008)(第二届信息与系统科学国际会议)
大连
英文
496-505
2008-12-18(万方平台首次上网日期,不代表论文的发表时间)