会议专题

Classification of Persian teztual documents using Learning Vector Quantization

Classification of the text documents into a predefined set of classes is considered to be an important task for natural language processing applications. There is usually a tradeoff between accuracy and complexity of text classification systems. In this paper, an experiment of classification of Persian documents by using the Learning Vector Quantization network is presented. In this method, each class is presented by an exemplar vector called codebook. The codebook vectors are placed in the feature space in a way that decision boundaries are approximated by the nearest neighbor rule. Compared to the K-Nearest Neighbour method, the LVQ requires less training examples and is believed to be much faster than other classification methods. The experimental results obtained from the classification of Persian textual documents using the LVQ algorithm are promising and prove that it can perform as an alternative to other methods like Support Vector Machines.

Learning vector quantization tezt classification Hamshahri2 Persian teztual corpus artificial neural networks natural language processing

Mohammad Taher Pilevar Heshaam Feili Mahmood Soltani

University of Tehran Tehran, Iran

国际会议

International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

大连

英文

1-6

2009-09-24(万方平台首次上网日期,不代表论文的发表时间)