Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Leveraging external resources for offensive content detection in social media
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-0546-116x
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6785-4356
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-8532-0895
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
2022 (English)In: AI Communications, ISSN 0921-7126, E-ISSN 1875-8452, Vol. 35, no 2, p. 87-109Article in journal (Refereed) Published
Abstract [en]

Hate speech is a burning issue of today’s society that cuts across numerous strategic areas, including human rights protection, refugee protection, and the fight against racism and discrimination. The gravity of the subject is further demonstrated by António Guterres, the United Nations Secretary-General, calling it “a menace to democratic values, social stability, and peace”. One central platform for the spread of hate speech is the Internet and social media in particular. Thus, automatic detection of hateful and offensive content on these platforms is a crucial challenge that would strongly contribute to an equal and sustainable society when overcome. One significant difficulty in meeting this challenge is collecting sufficient labeled data. In our work, we examine how various resources can be leveraged to circumvent this difficulty. We carry out extensive experiments to exploit various data sources using different machine learning models, including state-of-the-art transformers. We have found that using our proposed methods, one can attain state-of-the-art performance detecting hate speech on Twitter (outperforming the winner of both the HASOC 2019 and HASOC 2020 competitions). It is observed that in general, adding more data improves the performance or does not decrease it. Even when using good language models and knowledge transfer mechanisms, the best results were attained using data from one or two additional data sets.

Place, publisher, year, edition, pages
IOS Press, 2022. Vol. 35, no 2, p. 87-109
Keywords [en]
Hateful and offensive language, deep language processing, transfer learning, vocabulary augmentation, RoBERTa
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-90607DOI: 10.3233/aic-210138ISI: 000828016100004Scopus ID: 2-s2.0-85135231173OAI: oai:DiVA.org:ltu-90607DiVA, id: diva2:1657498
Note

Validerad;2022;Nivå 2;2022-07-20 (sofila)

Available from: 2022-05-11 Created: 2022-05-11 Last updated: 2023-09-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kovács, GyörgyAlonso, PedroSaini, RajkumarLiwicki, Marcus

Search in DiVA

By author/editor
Kovács, GyörgyAlonso, PedroSaini, RajkumarLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
In the same journal
AI Communications
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 139 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf