Bucketed common vector scaling for authorship attribution in heterogeneous web collections: A scaling approach for authorship attribution
Özet
Domain, genre and topic influences on author style adversely affect the performance of authorship attribution (AA) in multi-genre and multi-domain data sets. Although recent approaches to AA tasks focus on suggesting new feature sets and sampling techniques to improve the robustness of a classification system, they do not incorporate domain-specific properties to reduce the negative impact of irrelevant features on AA. This study presents a novel scaling approach, namely, bucketed common vector scaling, to efficiently reduce negative domain influence without reducing the dimensionality of existing features; therefore, this approach is easily transferable and applicable in a classification system. Classification performances on English-language competition data sets consisting of emails and articles and Turkish-language web documents consisting of blogs, articles and tweets indicate that our approach is very competitive to top-performing approaches in English competition data sets and is significantly improving the top classification performance in mixed-domain experiments on blogs, articles and tweets.
Kaynak
Journal of Information ScienceKoleksiyonlar
- Makale Koleksiyonu [100]
- Scopus İndeksli Yayınlar Koleksiyonu [8325]
- WoS İndeksli Yayınlar Koleksiyonu [7605]