Anomaly Detection as a Method for Uncovering Twitter Bots

Mata Sánchez, Javier Israel

View/Open

Final_Thesis_Israel_Mata_Diciembre_v3_signed.pdf Size (3.200Mb)

CartaAutorizacionTesis_firmado.pdf Size (249.4Kb)

Author

Mata Sánchez, Javier Israel

Metadata

Show full item record

Export citation

Abstract

During the past decades, online social networks (OSNs) have steadily grown to become the mainstream communication channels they are today. One of the most popular OSN is Twitter, a micro-blogging platform, which by 2019, had approximately 139 million daily active users. Interestingly enough, a relevant portion of the accounts registered in this social network is not human. Researchers have found that approximately 15% of all Twitter accounts, which is close to 48 million users, exhibit an automated behavior. Such automatically managed accounts are called bots. Bots have exhibited a diversity of behaviors, and therefore, of objectives. Some good uses of bots include automatically posting information about relevant news and academic papers, and even to provide orientation during emergencies. Unfortunately, malicious bots are also abundant. These types of bots have been used to distribute malware, send spam, and even to affect political discussions negatively. Moreover, malicious bots have also promoted terrorist propaganda and online extremism proselytism. Diverse bot detection methods and tools have been developed by researchers and by the social network companies themselves. Although there exist unsupervised learning bot detection methods, most of the state-of-the-art bot detection mechanisms make use of supervised learning machine learning algorithms. Due to their nature, these methods require examples of different bot types to detect them effectively. Nevertheless, obtaining examples of all the different bot types present on Twitter is not a trivial task. Moreover, bots are continuously evolving to evade current detection mechanisms. This thesis proposes to approach the Twitter bot detection problem by making use of one-class classifiers. Classifiers of this type only require examples of a normal class to detect anomalous behavior, thus are capable of overcoming the limitations of state-of-the-art methods. The experiments developed in this work demonstrate to what extent multi-class and binary classifiers are different from one-class in terms of performance. Also, the significance of these differences is measured. Results show that one-class classifiers yield higher and more stable performance than the other two types of classifiers when detecting bot types that were not used in their training phase of the algorithm. Additionally, the difference in performance is statistically significant. On the other hand, binary classifiers perform better than one-class, when detecting bots of a type that was present in the training phase of the algorithm. Given our results, one-class classifiers could serve as an early-warning system, detecting anomalous patterns of account behavior, which could represent a new bot. The results presented in this work can also contribute to the development of hybrid systems that combine features of binary-classifiers with the benefit of one-class methods. Such systems would represent a step towards broadening the protection of OSNs’ users from malicious bots, therefore benefiting a primary part of society.

URI

http://hdl.handle.net/11285/636278

Collections

Ciencias Exactas y Ciencias de la Salud 5426