A Hypothesis on Word Similarity and Its Application
A hypothesis is proposed: the semantic distance between the synonyms or near-synonyms should have the same characteristic as the distance in a metrics space.Metrics space is a set where a notion of distance (called a metric) between elements of the set is defined.At the same time,three properties should be held: (i) Identity of Indiscernibles - the distance is zero if and only if the two elements are the same.(ii) Symmetry - The distance between element A and B is equal to the distance between element B and A.(iii) Triangle Inequality - Given three elements A,B and C,the sum of any two pairs distance is greater or equal to the rest one.The first two properties is reasonable intuitively;as to the last one,we first get the word similarities based on HowNet and check whether the synonyms or near-synonyms listed in Cilin Extended Edition can satisfy this property.The experiments show that more than 98.5% triples (consists of three synonyms) satisfy the last property - triangle inequality.Fatherly,we detect a large quantity of thesaurus errors according to our hypothesis.
Word similarity Metrics space Cilin HowNet
Peng Jin Likun Qiu Xuefeng Zhu Pengyuan Liu
School of Computer Science,Leshan Normal University,Leshan 614004,China School of Chinese Language and Literature,Ludong University,Yantai 260045,China Institute of Computational Linguistics,Peking University,Beijing 100871,China Applied Linguistic Research Institute,Beijing Language and Culture University,Beijing,China
国际会议
Chinese Lexical Semantics 15th Workshop(CLSW 2014)(第十五届汉语词汇语义学国际研讨会)
澳门
英文
317-325
2014-06-09(万方平台首次上网日期,不代表论文的发表时间)