Pattern Mining for Information Extraction Using Lexicai, Syntactic and Semantic Information: Preliminary Results
A method is being developed to mine a text corpus for candidate linguistic patterns for information extraction.The candidate patterns can be used to improve the quality of extraction patterns constructed by a pseudo-supervised learning method--an automated method in which the system is provided with a high quality seed pattern or clue,which is used to generate a training set automatically.The study is carried out in the context of developing a system to extract disease-treatment information from medical abstracts retrieved from the Medline database.In an earlier study,the Apriori algorithm had been used to mine a sample of sentences containing a disease concept and a drug concept,to identify frequently occurring word patterns to see if these patterns could be used to identify treatment relations in text.Word patterns and statistical association measures alone were tound to be insufficient for generating good extractionpatterns,and need to be combined with syntactic and semantic constraints.Inthis study,we explore the use of syntactic,semantic and lexical constraints to improve the quality of extraction patterns.
Information Extraction Pattern Mining Apriori Algorithm
Christopher S.G.Khoo Jin-Cheon Na Wei Wang
Division of Information Studies,Wee Kim Wee School of Communication & Information,Nanyang Technological University,Singapore 637718
国际会议
4th Asia Information Retrieval Symposium(AIRS 2008)(第四届亚洲信息检索研讨会)
哈尔滨
英文
676-681
2008-01-16(万方平台首次上网日期,不代表论文的发表时间)