会议专题

ON THE ABSTRACTION AND PRESENTATION OF MULTI-SOURCE KNOWLEDGE

This paper proposed a knowledge abstraction and presentation system by information gathered Internet web pages. Documents gathered from different websites are first segmented into different paragraphs according to their topics. The linguistic processing such as word segmentation, word tagging and word frequency evaluation are applied to these corpora first Then two types of similarities are calculated in our study: the paragraph-based and sentence-based similarity.The paragraph-based similarity is used to group together those paragraphs with similar wordings. Then among each paragraph-group, the sentence-based similarity is applied to find those sentences with similar wordings. Thus, we chose from each group of sentences the most representative ones as the abstraction results.In the experiment, fifteen peculiar bird species are chosen as the abstraction topics. The abstraction of each bird is generated from the content of about 20 websites. The Mean Opinion Score (MOS) evaluation of the quality and quantity of abstraction shows an encourage result for our study.

Multi-document abstraction Paragraph similarity Document classification Peculiar bird species

HSIEN-CHANG WANG YUEH-CHIN CHAN

Department of Information Management, Chang Jung Christian University, Kway Jen, Tainan, Taiwan

国际会议

2008 International Conference on Machine Learning and Cybernetics(2008机器学习与控制论国际会议)

昆明

英文

3307-3309

2008-07-12(万方平台首次上网日期,不代表论文的发表时间)