Words Clustering Based on Keywords Indezing from Large-scale Categorization Corpora
Keywords are indexed automatically for large-scale categorization corpora. Indexed keywords of more than 20 documents are selected as seed words, thus overcoming subjectivity of selecting seed words in clustering; at the same time, clustering is limited to particular category corpora and keywords indexed feature extraction method is adopted to obtain domanial words automatically, thus reducing noise of similarity calculation.
Liu Hua
College of Chinese Language and Culture,Jinan University,Guangzhou,510610,China
国际会议
The Fifth International Conference on Information Assurance and Security(第五届信息保障与安全国际会议)
西安
英文
407-410
2009-08-18(万方平台首次上网日期,不代表论文的发表时间)