会议专题

Class-based Smoothing to Estimate the Probability of Domain Terms

This paper proposes a method to estimate the probability of a special kind of domain term,namely the probability of an anatomy noun appearing as a part or modifier of a disease named phrase,which is used for the sparse data smoothing of disease named phrase recognition.The method is to estimate the probabilities in terms of senses from a semantic hierarchy,and exploit the fact that the terms can be grouped into classes based on interrelated semantic senses.The class-based smoothing re-creates terms co-occurrence frequencies based on the information provided by a semantic hierarchy,in order to estimate the frequencies of candidate string occurring in an argument position.In this paper,the semantic hierarchy comes from the modularizing or partitioning of anatomy ontology.The modularizing method is to extract maximum spanning sub-trees,under restrictions,from the ontology that expresses foundational anatomical objects and relations.Through the partitioning,some sub-models are extracted.The sub-models form the foundation of the semantic hierarchy.A procedure is carried out that makes a tree cut model on the hierarchy structure as a back-off model to estimate probability distribution of terms.The determinative criterion of the tree cut is introduced according to chi-squared statistic and freedom degree two parameters.

Xiaobai CAI Xiaozhong FAN

School of Computer,Beijing Institute of Technology,Beijing 10081,China

国际会议

2007 IEEE/ICME International Conference on Complex Medical Engineering-CME2007(CME2007 第二届国际复合医学工程学术大会)

北京

英文

345-349

2007-05-23(万方平台首次上网日期,不代表论文的发表时间)