Multi-label Text Classification with a Robust Label Dependent Representation

摘要：

Automatic text classification is the task of assigning unseen documents to a predefined set of classes or categories. Text Representation for classification have been traditionally approached with tf.idf due to its simplicity and good performance. Multi-label automatic text classification has been traditionally tackled in the literature either by transforming the problem to apply binary techniques or by adapting binary algorithms to work with multiple labels. We present tf.rrfl, a novel text representation for the multilabel classification approach. Our proposal focuses on modifying the data set input to the algorithm, differentiating the input by the label to evaluate. Performance of tf.rrfl was tested with a known benchmark and compared to alternative techniques. The results show improvement compared to alternative approaches in terms of Hamming loss.

关键词： Multi-label Text classification Text representation Machine learning

作者: Rodrigo Alfaro Héctor Allende

作者单位: Departamento de Informática, Universidad Técnica Federico Santa María and Escuela de Ingeniería Info Departamento de Informática, Universidad Técnica Federico Santa María and Facultad de Ingeniería, Un

会议类型: 国际会议

会议名称: 2010 Third Pacific-Asia Conference on Web Mining and Web-based Application(2010年第三届web挖掘和基于web应用亚太会议 WMWA 2010)

会议地点: 桂林

会议语种:英文

页码: 17-20

在线出版日期: 2010-11-17（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Multi-label Text Classification with a Robust Label Dependent Representation