会议专题

Large-Scale Hierarchical Tezt classification Based on Path Semantic Information

Although an improvement of hierarchical text classification can be achieved by using hierarchical structure information, existing hierarchical text classification methods suffer from a problem, namely error propagation (especially in large-scale deep hierarchy). In this paper, we define the concept of path-based semantic vector for the presentation of categories based on which prior information provided by training set can be employed in a classifier-independent way to reduce and further eliminate classification errors. In particular, we first propose the occurrence probability based strategy for hierarchical text classification which can help limit errors rate efficiently. Cooccurrence probability is then introduced to correct the classification errors occurred on higher levels of the hierarchy. Extensive experiments show that our hierarchical classification strategies perform well on ODP dataset, even on deep levels of the hierarchy.

hierarchical classification error propagation path semantic representation prior information

Feng Gao Chengrong Wu Naiwang Guo Danfeng Zhao

School of Computer Science Fudan University Shanghai, 200433, China School of Computer Science Yanshan University Qinhuangdao, 066004, China

国际会议

The Second International Conference on Business Intelligence and Financial Engineering(BIFE 2009)(第二届商务智能与金融工程国际会议)

北京

英文

223-227

2009-07-24(万方平台首次上网日期,不代表论文的发表时间)