A Similarity Algorithm Based on the Generality and Individuality of Words
HowNet is a popular platform of Chinese text similarity calculation.The study has found that there is still some short-comings about the effect of HowNet architecture,the organization of vocabulary,concept description on word similarity measurement.In hence,on the basis of analyzing the generality and individuality of words in HowNet,a similarity algorithm based on the generality and individuality of words is proposed.Furthermore,experimental data is from NLPCC-ICCPOL 2016 Chinese words similarity evaluation task data set.Experimental results show that the algorithm is more feasible and stable,and better than some of the other classic algorithms.Moreover,the size of experimental data sets has a little influence on experimental results.In all experiments,the Pearson correlation coefficient and the Spearmans coefficient have stably reached 0.460 and 0.440.
words similarity HowNet Pearson correlation coefficient Spearman’s coefficient
ZOU Yinfeng OUYANG Chunping LIU Yongbin YANG Xiaohua YU Ying
School of Computer Science and Technology,University of South China,Hengyang 421001
国际会议
第五届自然语言处理与中文计算会议(NLPCC-ICCPOL2016)
昆明
英文
1-10
2016-12-02(万方平台首次上网日期,不代表论文的发表时间)