A Method for Stemming and Eliminating Common Words for Persian Tezt Summarization
With high increasing documents and electronic texts in Persian language, the use of fast methods to achieve texts through huge sets of documents is highly crucial. Persian text summarization which shows the main concept of a text in minimum size is an effective solution. One of the steps in Persian text summarization is to stem and eliminate common words. The aim of this research is to stem words from Persian documents to make their use more efficient in text summarization, the present method is to eliminate words and stem keywords. The compound of existing techniques in the words network was used to create a Persian database using the Dehkhoda dictionary. The algorithm used for summarization is based on statistical techniques. In this method each sentence is given an important weight, sentences with higher weight are used for summarization. By comparing the results of other algorithms on Persian texts we concluded that our technique extracts the root of the existing words with more precision.
Tezt Summarization stemming common words Database
Marzieh BERENJKOOB Razieh MEHRI Hadi KHOSRAVI Mohammad Ali
Department of Computer Engineering.University of Isfahan. Isfahan, Iran NEMATBAKHSH Department of Computer Engineering.University of Isfahan/Faculty of Engineering. Isfahan
国际会议
大连
英文
1-6
2009-09-24(万方平台首次上网日期,不代表论文的发表时间)