With the ubiquity and anonymity of the Internet, the spread of hate speech has been a growing concern for many years now. The language used for the purpose of dehumanizing, defaming or threatening individuals and marginalized groups not only threatens the mental health of its targets, as well as their democratic access to the Internet, but also the fabric of our society. Because of this, much effort has been devoted to manual moderation. The amount of data generated each day, particularly on social media platforms such as Facebook and twitter, however makes this a Sisyphean task. This has led to an increased demand for automatic methods of hate speech detection.
Here, to contribute towards solving the task of hate speech detection, we worked with a simple ensemble of transformer models on a twitter-based hate speech benchmark. Using this method, we attained a weighted F1-score of 0.8426, which we managed to further improve by leveraging more training data, achieving a weighted F1-score of 0.8504. Thus markedly outperforming the best performing system in the literature.
ISBN för värdpublikation: 978-3-030-60275-8, 978-3-030-60276-5