Extracting 5W1H Event Semantic Elements from Chinese Online News
This paper proposes a verb-driven approach to extract 5W1H (Who, What, Whom, When, Where and How) event semantic informa tion from Chinese online news. The main contributions of our work are two-fold: First, given the usual structure of a news story, we propose a novel algorithm to extract topic sentences hy stressing the importance of news headline; Second, we extract event facts (i.e. 5W1H) from these topic sentences by applying a rule-based method (verbdriven) and a supervised machine-learning method (SVM). This method significantly improves the predicate-argument structure used in Automatic Content Extraction (ACE) Event Extraction (EE) task by considering valency (dominant capacity to noun phrases) of a Chinese verb. Extensive exper iments on ACE 2005 datasets confirm its effectiveness and it also shows a very high scalability, since we only consider the topic sentences and surface text features. Based on this method, we build a prototype system named Chinese News Fact Extractor (CNFE). CNFE is evaluated on a real world corpus containing 30,000 newspaper documents. Experiment results show that CNFE can extract event facts efficiently.
Relationship Extraction Event Extraction Verb-driven
Wei Wang Dongyan Zhao Lei Zou Dong Wang Weiguo Zheng
Institute of Computer Science & Technology, Peking University, Beijing, China Engineering College of Institute of Computer Science & Technology, Peking University, Beijing, China Key Laboratory of Comp Institute of Computer Science & Technology, Peking University, Beijing, China
国际会议
11th International Conference,WAIM 2010(第十一届网络时代管理国际会议)
九寨沟
英文
644-655
2010-07-14(万方平台首次上网日期,不代表论文的发表时间)