Anomaly Detection as a Method for Uncovering Twitter Bots
Export citation
Abstract
During the past decades, online social networks (OSNs) have steadily grown to become
the mainstream communication channels they are today. One of the most popular OSN is
Twitter, a micro-blogging platform, which by 2019, had approximately 139 million daily active
users. Interestingly enough, a relevant portion of the accounts registered in this social
network is not human. Researchers have found that approximately 15% of all Twitter accounts,
which is close to 48 million users, exhibit an automated behavior. Such automatically
managed accounts are called bots. Bots have exhibited a diversity of behaviors, and therefore,
of objectives. Some good uses of bots include automatically posting information about
relevant news and academic papers, and even to provide orientation during emergencies. Unfortunately,
malicious bots are also abundant. These types of bots have been used to distribute
malware, send spam, and even to affect political discussions negatively. Moreover, malicious
bots have also promoted terrorist propaganda and online extremism proselytism.
Diverse bot detection methods and tools have been developed by researchers and by the
social network companies themselves. Although there exist unsupervised learning bot detection
methods, most of the state-of-the-art bot detection mechanisms make use of supervised
learning machine learning algorithms. Due to their nature, these methods require examples
of different bot types to detect them effectively. Nevertheless, obtaining examples of all the
different bot types present on Twitter is not a trivial task. Moreover, bots are continuously
evolving to evade current detection mechanisms.
This thesis proposes to approach the Twitter bot detection problem by making use of
one-class classifiers. Classifiers of this type only require examples of a normal class to detect
anomalous behavior, thus are capable of overcoming the limitations of state-of-the-art
methods. The experiments developed in this work demonstrate to what extent multi-class and
binary classifiers are different from one-class in terms of performance. Also, the significance
of these differences is measured. Results show that one-class classifiers yield higher and more
stable performance than the other two types of classifiers when detecting bot types that were
not used in their training phase of the algorithm. Additionally, the difference in performance
is statistically significant. On the other hand, binary classifiers perform better than one-class,
when detecting bots of a type that was present in the training phase of the algorithm.
Given our results, one-class classifiers could serve as an early-warning system, detecting
anomalous patterns of account behavior, which could represent a new bot. The results
presented in this work can also contribute to the development of hybrid systems that combine
features of binary-classifiers with the benefit of one-class methods. Such systems would represent
a step towards broadening the protection of OSNs’ users from malicious bots, therefore
benefiting a primary part of society.