MAPPING NOMINAL VALUES TO NUMBERS BY DATA MINING SPECTRAL PROPERTIES OF LEAVES
Many data mining algorithms require numeric, preferably normalized (scaled) data i.e. within specific ranges. Agriculture data by default is non-numeric or nominal in nature. Different techniques are available for mapping from nominal to numeric values, and the selection depends on the problem at hand and the mining tool to be used. However, most (if not all) the techniques are based on the statistical properties of the data, and thus miss the intrinsic and natural relationship among the attributes of a plant.We propose a mechanism of performing the mapping from nominal to numeric values (actually ranking) based on the transmittance as well as the statistical properties of the plant. Spectral analysis (using chemical means) is a tedious and time consuming process, thus difficult to repeat, each and every time, for classification of (numerically) unclassified cotton varieties. So a supporting statistical method is proposed based on linear regression curve fitting using normalized nominal attributes. Subsequently a rank is assigned to the variety based on its R value and slope of the plot. This rank thus becomes the numeric equivalent of the nominal alphanumeric name of the variety being considered.The most complex issue of this approach is the column ordering used while generating the regression plots. Number of orderings and column choices weretested based on leaf characteristics, plant characteristics etc. However, the choice and ordering of columns based on botanical characteristic gave the best results. We proceeded by performing a spectral analysis of 12 cotton varieties in the visible and Near Infra Red (NIR) regions. Out of the 12 varieties, four varieties were used as the training set, and five other varieties were used as the test set for classification purposes. A 60% classification accuracy was achieved i.e. correspondence in order of ranking generated through regression, as compared with the spectral rank order (taken as the absolute).
Ahsan Abdullah Rizwan Bulbul Tahir Mehmood
Center for Agro-Informatics Research National University of Computer & Emerging Sciences FAST House, Center for Agro-Informatics Research National University of Computer & Emerging Sciences FAST House,
国际会议
北京
英文
171-177
2005-10-14(万方平台首次上网日期,不代表论文的发表时间)