A PROPERTY OPTIMIZATION METHOD in SUPPORT of APPROXIMATELY DUPLICATED RECORDS DETECTING
In approximately duplicated records detecting of large dataset, the composition of data is complicated and the properties of data are too many, so the measurement accuracy is not high, the implementation cost is oversized. In view of these problems, a subfuzzy clustering property optimization method based on grouping is proposed. That is, first, the properties of group record are processed to reduce the dimension of property effectively and obtain the representation of the group, and then a similarity comparison method is used to detect approximately duplicated records in groups. It is shown in theoretical analysis and experiment, this method has higher detection accuracy and efficiency, and could better solve the recognition problems of approximately duplicated records in large dataset.
Property Optimization Approzimately Duplicated Records Sub-Fuzzy Clustering Similarity
Xiao Mansheng Liu Youshi Zhou Xiaoqi
School of Science,Hunan University of Technology Zhuzhou,412008,Hunan,China College of Science and Technology,Hunan University of Technology,Zhuzhou,412008,Hunan,China
国际会议
上海
英文
1933-1937
2009-11-20(万方平台首次上网日期,不代表论文的发表时间)