Pattern Mining for Information Extraction Using Lexicai, Syntactic and Semantic Information: Preliminary Results

摘要：

A method is being developed to mine a text corpus for candidate linguistic patterns for information extraction.The candidate patterns can be used to improve the quality of extraction patterns constructed by a pseudo-supervised learning method--an automated method in which the system is provided with a high quality seed pattern or clue,which is used to generate a training set automatically.The study is carried out in the context of developing a system to extract disease-treatment information from medical abstracts retrieved from the Medline database.In an earlier study,the Apriori algorithm had been used to mine a sample of sentences containing a disease concept and a drug concept,to identify frequently occurring word patterns to see if these patterns could be used to identify treatment relations in text.Word patterns and statistical association measures alone were tound to be insufficient for generating good extractionpatterns,and need to be combined with syntactic and semantic constraints.Inthis study,we explore the use of syntactic,semantic and lexical constraints to improve the quality of extraction patterns.

关键词： Information Extraction Pattern Mining Apriori Algorithm

作者: Christopher S.G.Khoo Jin-Cheon Na Wei Wang

作者单位: Division of Information Studies,Wee Kim Wee School of Communication & Information,Nanyang Technological University,Singapore 637718

会议类型: 国际会议

会议名称: 4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)

会议地点: 哈尔滨

会议语种:英文

页码: 676-681

在线出版日期: 2008-01-16（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Pattern Mining for Information Extraction Using Lexicai, Syntactic and Semantic Information: Preliminary Results