TOWARD UNIFICATION OF SOURCE ATTRIBUTION PROCESSES AND TECHNIQUES

摘要：

Automatic Source Attribution refers to the ability for an autonomous process to determine the source of a previously unexamined piece of writing. Statistical methods for source attribution have been the subject of scholarly research for well over a century. The field, however, is still missing a definitive currency of established or agreed-upon classes of features,methods, techniques and nomenclature. This paper represents continuation of research into the basic attribution problem, as well as work towards an eventual source attribution standard.We augment previous work which utilized in-common, nontrivial word frequencies with neural networks on a more standardized data set. We also use two other techniques:Phrase-based feature sets evaluated with naive Bayesians and bi-gram feature sets evaluated with the nearest neighbor algorithm. We compare the three and explore methods of combining the techniques in order to achieve better results.

关键词： Source attribution neural networks na(i)ve Bayesian authorship attribution n-grams Meta predictors

作者: FOAAD KHOSMOOD ROBERT LEVINSON

作者单位: Department of Computer Science University of California at Santa Cruz

会议类型: 国际会议

会议名称: 2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

会议地点: 大连

会议语种:英文

页码: 4551-4556

在线出版日期: 2006-08-13（万方平台首次上网日期，不代表论文的发表时间）

会议专题

TOWARD UNIFICATION OF SOURCE ATTRIBUTION PROCESSES AND TECHNIQUES