会议专题

TOWARD UNIFICATION OF SOURCE ATTRIBUTION PROCESSES AND TECHNIQUES

Automatic Source Attribution refers to the ability for an autonomous process to determine the source of a previously unexamined piece of writing. Statistical methods for source attribution have been the subject of scholarly research for well over a century. The field, however, is still missing a definitive currency of established or agreed-upon classes of features,methods, techniques and nomenclature. This paper represents continuation of research into the basic attribution problem, as well as work towards an eventual source attribution standard.We augment previous work which utilized in-common, nontrivial word frequencies with neural networks on a more standardized data set. We also use two other techniques:Phrase-based feature sets evaluated with naive Bayesians and bi-gram feature sets evaluated with the nearest neighbor algorithm. We compare the three and explore methods of combining the techniques in order to achieve better results.

Source attribution neural networks na(i)ve Bayesian authorship attribution n-grams Meta predictors

FOAAD KHOSMOOD ROBERT LEVINSON

Department of Computer Science University of California at Santa Cruz

国际会议

2006 International Conference on Machine Learning and Cybernetics(IEEE第五届机器学习与控制论坛)

大连

英文

4551-4556

2006-08-13(万方平台首次上网日期,不代表论文的发表时间)