A Clustering Based Approach for Domain Relevant Relation Extraction

摘要：

Most existing corpus based relation extraction techniques focus on predefined relations. In this paper, a clustering based method is presented for domain relevant relation extraction including both relation type discovery and relation instance extraction. Given two raw corpora, one in the general domain, one in an application domain, domain specific verbs connecting different instances are extracted based on syntactic dependency as well as a small set of domain concept instance seeds. Relation types are then discovered based on verb clustering followed by relation instance extraction. The proposed approach requires no predefined relation types, no prior training of domain knowledge, and no need for manually annotated corpora. This method is applicable to any domain corpus and it is especially useful for knowledge-limited and resource-limited domains. Evaluations conducted on Chinese football domain for relation extraction show that the approach discovers various relations with good performance.

关键词： Relation extraction relation type discovery verb clustering domain verb extraction information extraction

作者: Yuhang YANG Qin LU Tiejun ZHAO

作者单位: School of Computer Science and Technology,Harbin Institute of Technology,Harbin,China Department of Computing,Hong Kong Polytechnic University,Hong Kong

会议类型: 国际会议

会议名称: The 2008 IEEE International Conference on Natural Language Processing and Knowledge Engineering(IEEE NLP-KE 2008)(2008IEEE自然语言处理与知识工程国际会议)

会议地点: 北京

会议语种:英文

在线出版日期: 2008-10-19（万方平台首次上网日期，不代表论文的发表时间）

会议专题

A Clustering Based Approach for Domain Relevant Relation Extraction