Classification of Persian teztual documents using Learning Vector Quantization

摘要：

Classification of the text documents into a predefined set of classes is considered to be an important task for natural language processing applications. There is usually a tradeoff between accuracy and complexity of text classification systems. In this paper, an experiment of classification of Persian documents by using the Learning Vector Quantization network is presented. In this method, each class is presented by an exemplar vector called codebook. The codebook vectors are placed in the feature space in a way that decision boundaries are approximated by the nearest neighbor rule. Compared to the K-Nearest Neighbour method, the LVQ requires less training examples and is believed to be much faster than other classification methods. The experimental results obtained from the classification of Persian textual documents using the LVQ algorithm are promising and prove that it can perform as an alternative to other methods like Support Vector Machines.

关键词： Learning vector quantization tezt classification Hamshahri2 Persian teztual corpus artificial neural networks natural language processing

作者: Mohammad Taher Pilevar Heshaam Feili Mahmood Soltani

作者单位: University of Tehran Tehran, Iran

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-6

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Classification of Persian teztual documents using Learning Vector Quantization