会议专题

Modeling for Noisy Labels of Crowd Workers

  Crowdsourcing services can collect a large amount of labeled data at a low cost.Nonetheless,due to some influence factors such as the unqualified crowd workers and the controversiality of instances to be labeled,the collected labels often contain noisy data,i.e.,they sometimes are randomly given,incorrect,or missing.Although approaches have been proposed to infer these influence factors to help better model the labeling results,the inferences are not guaranteed to reflect the true effects of the influence factors on the uncertainty and errors in the labels.In this paper,we propose to conduct probability fitting over the noisy labeled data with Bernoulli Mixture Model.Workers with similar behaviors correspond to a same Bernoulli component in the mixture model.The effects of influence factors are fused in the Bernoulli parameter of each Bernoulli component,which directly reflects the uncertainty of labels,and can help identify labeling errors,predict real labels,and reveal the behavior patterns of crowd workers.Experiments on both benchmark and real datasets verify the efficacy of our model.

Qian Yan Hao Huang Yunjun Gao Chen Ying Qingyang Hu Tieyun Qian Qinming He

State Key Laboratory of Software Engineering,Wuhan University,Wuhan,China College of Computer Science,Zhejiang University,Hangzhou,China

国际会议

International Asia-Pacific Web Conference(第18届国际亚太互联网大会)

苏州

英文

227-238

2016-09-23(万方平台首次上网日期,不代表论文的发表时间)