How Does Data Size Influence Accuracy of Co-authorship Prediction? An Empirical Study
In order to find the optimal size of datasets for co-authorship prediction and compare the co-authorship predictors fairly, it is necessary to track the changes of predictors accuracy in different sizes of datasets.This paper selects 12 representative predictors for co-authorship prediction which are evaluated by link prediction in different sizes of datasets, to reveal how and why data size influences accuracy of co-authorship prediction.In the field of Library and Information Science (LIS), the different sizes of co-authorship networks are formed through author frequency.The results show that the larger the size of dataset, the higher the accuracy of the coauthorship prediction in most of the times.And the best appropriate dataset for co-authorship prediction is the co authorship network without any filtering, where the accuracy of top three best predictors is highest among all datasets.The reason is that the data size becomes larger, the co-authorship network is closer to the real situation, and thus, the advantages of improvements of predictors could be fully activated.Furthermore, the predictor has preferential dataset because optimal predictor changes along with different sizes of datasets.It indicates that a fair comparison among predictors should be experimented in different sizes of datasets.The method could be extended to other areas to validate the conclusions.
data size co-authorship prediction Library and Information Science link prediction accuracy optimal predictor
ZHANG Jinzhu L(U) Pin
Department of Information Management, Nanjing University of Science and Technology, Nanjing 210094, China
国际会议
武汉
英文
290-295
2016-10-21(万方平台首次上网日期,不代表论文的发表时间)