会议专题

Extracting Chinese Multi-word Terms from Small Corpus

In this paper,we present an automatic terminologyextraction approach for Chinese multi-word terms.Inthis term extraction system,besides five linguistic rulesacquired.from an available term list by some machinelearning methods,two statistical strategies areinvolved: a termhood measure based on the termdistribution variation,and a unithood measureadopting the left and right entropy method to estimatethe collocation variation degree.The candidates areranked according to the values of the former.Thelatter is used to filter the preposition phrases and someverb-object phrases that rarely appear as terms.Byvalidating on a small scale corpus in the computerdomain,the precision reaches 91.5% of the top 2000outputs.

Zhou Lang Zhang Liang Feng Chong Huang Heyan

College of Computer Science and Technology,Nanjing University of Science and Technology Nangjing,210 Dept.of Computer Dept.of Computer Science and Technology,Nangiing University,Nanjing,210093 Research Center of Computer & Language Information Engineering,CAS,Beijing,100089

国际会议

2008 3rd International Conference on Intelligent System and Knowledge Engineering(第三届智能系统与知识工程国际会议)(ISKE 2008)

厦门

英文

813-818

2008-11-17(万方平台首次上网日期,不代表论文的发表时间)