ON THE ABSTRACTION AND PRESENTATION OF MULTI-SOURCE KNOWLEDGE

摘要：

This paper proposed a knowledge abstraction and presentation system by information gathered Internet web pages. Documents gathered from different websites are first segmented into different paragraphs according to their topics. The linguistic processing such as word segmentation, word tagging and word frequency evaluation are applied to these corpora first Then two types of similarities are calculated in our study: the paragraph-based and sentence-based similarity.The paragraph-based similarity is used to group together those paragraphs with similar wordings. Then among each paragraph-group, the sentence-based similarity is applied to find those sentences with similar wordings. Thus, we chose from each group of sentences the most representative ones as the abstraction results.In the experiment, fifteen peculiar bird species are chosen as the abstraction topics. The abstraction of each bird is generated from the content of about 20 websites. The Mean Opinion Score (MOS) evaluation of the quality and quantity of abstraction shows an encourage result for our study.

关键词： Multi-document abstraction Paragraph similarity Document classification Peculiar bird species

作者: HSIEN-CHANG WANG YUEH-CHIN CHAN

作者单位: Department of Information Management, Chang Jung Christian University, Kway Jen, Tainan, Taiwan

会议类型: 国际会议

会议名称: 2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

会议地点: 昆明

会议语种:英文

页码: 3307-3309

在线出版日期: 2008-07-12（万方平台首次上网日期，不代表论文的发表时间）

会议专题

ON THE ABSTRACTION AND PRESENTATION OF MULTI-SOURCE KNOWLEDGE