会议专题

Towards Scalable Emotion Classification in Microblog Based on Noisy Training Data

  The availability of labeled corpus is of great importance for emotion classification tasks.Because manual labeling is too time-consuming,hashtags have been used as naturally annotated labels to obtain large amount of labeled training data from microblog.However,the inconsistency and noise in annotation can adversely affect the data quality and thus the performance when used to train a classifier.In this paper,we propose a classification framework which allows naturally annotated data to be used as additional training data and employs a k-NN graph based data cleaning method to remove noise after noisy data has certain accumulations.Evaluation on NLP&CC2013 Chinese Weibo emotion classification dataset shows that our approach achieves 15.8%better performance than directly using the noisy data without noise filtering.After adding the filtered data with hashtags into an existing high-quality training data,the performance increases 3.7%compared to using the high-quality training data alone.

emotion classification data cleaning hashtag k-NN

Minglei Li Qin Lu Lin Gui Yunfei Long

Department of Computing,The Hong Kong Polytechnic University,Hung Hom,Hong Kong Laboratory of Network Oriented Intelligent Computation,Shenzhen Graduate School,Harbin Institute of

国内会议

第十五届全国计算语言学学术会议(CCL2016)暨第四届基于自然标注大数据的自然语言处理国际学术研讨会(NLP-NABD-2016)

烟台

英文

1-12

2016-10-14(万方平台首次上网日期,不代表论文的发表时间)