Extracting Chinese Multi-word Terms from Small Corpus
In this paper,we present an automatic terminologyextraction approach for Chinese multi-word terms.Inthis term extraction system,besides five linguistic rulesacquired.from an available term list by some machinelearning methods,two statistical strategies areinvolved: a termhood measure based on the termdistribution variation,and a unithood measureadopting the left and right entropy method to estimatethe collocation variation degree.The candidates areranked according to the values of the former.Thelatter is used to filter the preposition phrases and someverb-object phrases that rarely appear as terms.Byvalidating on a small scale corpus in the computerdomain,the precision reaches 91.5% of the top 2000outputs.
Zhou Lang Zhang Liang Feng Chong Huang Heyan
College of Computer Science and Technology,Nanjing University of Science and Technology Nangjing,210 Dept.of Computer Dept.of Computer Science and Technology,Nangiing University,Nanjing,210093 Research Center of Computer & Language Information Engineering,CAS,Beijing,100089
国际会议
厦门
英文
813-818
2008-11-17(万方平台首次上网日期,不代表论文的发表时间)