Automatic Arabic Document Classification via kNN

摘要：

Many algorithms have been implemented for the problem of document categorization.The majority work in this area was achieved for English text,while a very few approaches have been introduced for the Arabic text.The nature of Arabic text is different from that of the English text and the preprocessing of the Arabic text is more challenging.This is due to Arabic language is a highly inflectional and derivational language that makes document mining a hard and complex task.In this paper,we present an Automatic Arabic documents classification system based on kNN algorithm.Also,we develop an approach to solve keywords extraction and reduction problems by using Document Frequency (DF) threshold method.The results indicate that the ability of the kNN to deal with Arabic text outperforms the other existing systems.The proposed system reached 0.95 micro-recall scores with 850 Arabic texts in 6 different categories.

关键词： Arabic documents classification kNN vector model keywords extraction

作者: M.O.Iwidat Yiming Zhou

作者单位: School of Computer Science and Engineering,Beijing University of Aeronautics and Astronautics,Beijing 100083,China

会议类型: 国际会议

会议名称: 第三届国际计算智能和工业应用研讨会(The 3rd International Symposium on Computational Intelligence and Industrial Application(ISCIIA 2008))

会议地点: 云南大理

会议语种:英文

页码: 314-324

在线出版日期: 2008-11-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Automatic Arabic Document Classification via kNN