Class-based Smoothing to Estimate the Probability of Domain Terms
This paper proposes a method to estimate the probability of a special kind of domain term,namely the probability of an anatomy noun appearing as a part or modifier of a disease named phrase,which is used for the sparse data smoothing of disease named phrase recognition.The method is to estimate the probabilities in terms of senses from a semantic hierarchy,and exploit the fact that the terms can be grouped into classes based on interrelated semantic senses.The class-based smoothing re-creates terms co-occurrence frequencies based on the information provided by a semantic hierarchy,in order to estimate the frequencies of candidate string occurring in an argument position.In this paper,the semantic hierarchy comes from the modularizing or partitioning of anatomy ontology.The modularizing method is to extract maximum spanning sub-trees,under restrictions,from the ontology that expresses foundational anatomical objects and relations.Through the partitioning,some sub-models are extracted.The sub-models form the foundation of the semantic hierarchy.A procedure is carried out that makes a tree cut model on the hierarchy structure as a back-off model to estimate probability distribution of terms.The determinative criterion of the tree cut is introduced according to chi-squared statistic and freedom degree two parameters.
Xiaobai CAI Xiaozhong FAN
School of Computer,Beijing Institute of Technology,Beijing 10081,China
国际会议
北京
英文
345-349
2007-05-23(万方平台首次上网日期,不代表论文的发表时间)