Modeling for Noisy Labels of Crowd Workers
Crowdsourcing services can collect a large amount of labeled data at a low cost.Nonetheless,due to some influence factors such as the unqualified crowd workers and the controversiality of instances to be labeled,the collected labels often contain noisy data,i.e.,they sometimes are randomly given,incorrect,or missing.Although approaches have been proposed to infer these influence factors to help better model the labeling results,the inferences are not guaranteed to reflect the true effects of the influence factors on the uncertainty and errors in the labels.In this paper,we propose to conduct probability fitting over the noisy labeled data with Bernoulli Mixture Model.Workers with similar behaviors correspond to a same Bernoulli component in the mixture model.The effects of influence factors are fused in the Bernoulli parameter of each Bernoulli component,which directly reflects the uncertainty of labels,and can help identify labeling errors,predict real labels,and reveal the behavior patterns of crowd workers.Experiments on both benchmark and real datasets verify the efficacy of our model.
Qian Yan Hao Huang Yunjun Gao Chen Ying Qingyang Hu Tieyun Qian Qinming He
State Key Laboratory of Software Engineering,Wuhan University,Wuhan,China College of Computer Science,Zhejiang University,Hangzhou,China
国际会议
International Asia-Pacific Web Conference(第18届国际亚太互联网大会)
苏州
英文
227-238
2016-09-23(万方平台首次上网日期,不代表论文的发表时间)