Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hate Speech Detection using Transformer Ensembles on the HASOC Dataset
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6785-4356
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-8532-0895
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab. MTA-SZTE Research Group on Artificial Intelligence, Szeged, Hungary.ORCID iD: 0000-0002-0546-116x
2020 (English)In: Speech and Computer: 22nd International Conference, SPECOM 2020, St. Petersburg, Russia, October 7–9, 2020, Proceedings / [ed] Alexey Karpov, Rodmonga Potapova, Springer, 2020, p. 13-21Conference paper, Published paper (Refereed)
Abstract [en]

With the ubiquity and anonymity of the Internet, the spread of hate speech has been a growing concern for many years now. The language used for the purpose of dehumanizing, defaming or threatening individuals and marginalized groups not only threatens the mental health of its targets, as well as their democratic access to the Internet, but also the fabric of our society. Because of this, much effort has been devoted to manual moderation. The amount of data generated each day, particularly on social media platforms such as Facebook and twitter, however makes this a Sisyphean task. This has led to an increased demand for automatic methods of hate speech detection.

Here, to contribute towards solving the task of hate speech detection, we worked with a simple ensemble of transformer models on a twitter-based hate speech benchmark. Using this method, we attained a weighted F1-score of 0.8426, which we managed to further improve by leveraging more training data, achieving a weighted F1-score of 0.8504. Thus markedly outperforming the best performing system in the literature.

Place, publisher, year, edition, pages
Springer, 2020. p. 13-21
Series
Lecture Notes in Artificial Intelligence, ISSN 0302-9743, E-ISSN 1611-3349 ; 12335
Keywords [en]
Natural Language Processing, Hate Speech Detection, Transformers, RoBERTa, Ensemble
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-80629DOI: 10.1007/978-3-030-60276-5_2Scopus ID: 2-s2.0-85092898876OAI: oai:DiVA.org:ltu-80629DiVA, id: diva2:1462725
Conference
22nd International Conference on Speech and Computer (SPECOM 2020), 7-9 October, 2020, St. Petersburg, Russia
Funder
Vinnova, 2019-02996
Note

ISBN för värdpublikation: 978-3-030-60275-8, 978-3-030-60276-5

Available from: 2020-08-31 Created: 2020-08-31 Last updated: 2023-09-05Bibliographically approved

Open Access in DiVA

fulltext(267 kB)1251 downloads
File information
File name FULLTEXT01.pdfFile size 267 kBChecksum SHA-512
1bc34e92361b4c1845a0978a07798f2f2c2bed035a9b89c9720c15239e19b405e80ca85c50ded06c60a0fdca1393fa92c8e8c6e3f8dded59310dd24ef53499a2
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Alonso, PedroSaini, RajkumarKovács, György

Search in DiVA

By author/editor
Alonso, PedroSaini, RajkumarKovács, György
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 1252 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 566 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf