How Does Data Size Influence Accuracy of Co-authorship Prediction? An Empirical Study

摘要：

　　In order to find the optimal size of datasets for co-authorship prediction and compare the co-authorship predictors fairly, it is necessary to track the changes of predictors accuracy in different sizes of datasets.This paper selects 12 representative predictors for co-authorship prediction which are evaluated by link prediction in different sizes of datasets, to reveal how and why data size influences accuracy of co-authorship prediction.In the field of Library and Information Science (LIS), the different sizes of co-authorship networks are formed through author frequency.The results show that the larger the size of dataset, the higher the accuracy of the coauthorship prediction in most of the times.And the best appropriate dataset for co-authorship prediction is the co authorship network without any filtering, where the accuracy of top three best predictors is highest among all datasets.The reason is that the data size becomes larger, the co-authorship network is closer to the real situation, and thus, the advantages of improvements of predictors could be fully activated.Furthermore, the predictor has preferential dataset because optimal predictor changes along with different sizes of datasets.It indicates that a fair comparison among predictors should be experimented in different sizes of datasets.The method could be extended to other areas to validate the conclusions.

关键词： data size co-authorship prediction Library and Information Science link prediction accuracy optimal predictor

作者: ZHANG Jinzhu L(U) Pin

作者单位: Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China

会议类型: 国际会议

会议名称: 第二届信息获取与知识服务国际会议

会议地点: 武汉

会议语种:英文

页码: 290-295

在线出版日期: 2016-10-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

How Does Data Size Influence Accuracy of Co-authorship Prediction? An Empirical Study