会议专题

An Information Arrangement Technique for a Tezt Classification and Summarization Based on a Summarization Frame

In this paper, the purpose is to arrange information to understand at one view. The proposed summarization frame technology is a system to hierarchically arrange and classify information by targeting content and level of importance in sentences. Moreover, the technique in which the Concept Base, the Degree of Association Algorithm, the Time Judgment system and the Place judgment system are used to understand content of sentences is proposed. The Concept Base generates semantics from a certain word, and the Degree of Association Algorithm uses the results of the semantics expansion to express the relationship between one word and another as a numeric value. Only needed information like the number of strokes limitation etc. can be easily extracted by hierarchically arranging information in the document summary. Moreover, the speed-up of the retrieval can be expected by narrowing the retrieval object in information retrieval. An answer matched to TPO can be expected to be achieved in a QA system. Sentences are classified according to the content. Each classification is classified into a more detailed field. Important keywords are extracted from the sentences classified into the field. Moreover, the extracted keywords are classified into common and peculiar word for the sentences in the field. In addition, sentences of each field hierarchize sentences to three stages according to the importance of the content. In addition, the sentences of each field are hierarchized at three levels according to the importance of the content.

information arrangement document classification summarization Concept Base Degree of Association

Seiji Tsuchiya Eriko Yoshimura Hirokazu Watabe

Dept. of Information and Computer Science, Doshisha University Kyo-Tanabe, Kyoto, Japan

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-5

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)