INDEXING BANGLA NEWSPAPER ARTICLES USING FUZZY AND CRISP CLUSTERING ALGORITHMS

摘要：

The paper presents two document clustering techniques to group Bangla newspaper articles. The first one is based on traditional c-means algorithm, and the later is based on its fuzzy counterpart, i.e., fuzzy c-means algorithm. The key principle for both of those techniques is to measure the frequency of keywords in a particular type of article to calculate the significance of those keywords. The articles are then clustered based on the significance of the keywords. We believe the findings from this research will help to index Bangla newspaper articles. Therefore, the information retrieval will be faster than before. However, one of the challenge is to find the salient features from hundred of features found in documents. Besides, both clustering algorithms work well on lower dimensions. To address this, we use three dimensionality reduction techniques, known as Principle Component Analysis (PCA), Factor Analysis (FA) and Linear Discriminant Analysis (LDA). We present and analyze the performance of traditional and fuzzy c-means algorithms with different dimensionality reduction techniques.

关键词： Principal Component Analysis (PCA) Linear Discriminant Analysis (LDA) Factor Analysis (FA) Intensification c-Means clustering Fuzzy c-Means clustering

作者: A. K. M. Zahiduzzaman Mohammad Nahyan Quasem Faiyaz Ahmed Rashedur M. Rahman

作者单位: Department of Electrical Engineering and Computer Science, North South University, Bashundhara, Dhaka, Bangladesh

会议类型: 国际会议

会议名称: 13th International Conference on Enterprise Information System(第13届企业信息系统国际会议 ICEIS 2011)

会议地点: 北京

会议语种:英文

页码: 2245-2248

在线出版日期: 2011-06-08（万方平台首次上网日期，不代表论文的发表时间）

会议专题

INDEXING BANGLA NEWSPAPER ARTICLES USING FUZZY AND CRISP CLUSTERING ALGORITHMS