Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-7924-4953
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-0546-116x
2022 (English)In: FIRE 2022 Working Notes / [ed] Kripabandhu Ghosh, Thomas Mandl, Prasenjit Majumder, Mandar Mitra, CEUR-WS , 2022, Vol. 3395, p. 196-207Conference paper, Published paper (Refereed)
Abstract [en]

Hateful content is published and spread on social media at an increasing rate, harming the user experience.In addition, hateful content targeting particular, marginalized/vulnerable groups (e.g. homophobic/trans-phobic content) can cause even more harm to members of said groups. Hence, detecting hateful contentis crucial, regardless of its origin, or the language used. The large variety of (often underresourced)languages used, however, makes this task daunting, especially as many users use code-mixing in theirmessages. To help overcome these difficulties, the approach we present here uses a multi-languageframework. And to further mitigate the scarcity of labelled data, it also leverages data from the relatedtask of sentiment-analysis to improve the detection of homophobic/transphobic content. We evaluatedour system by participating in a sentiment analysis and hate speech detection challenge. Results showthat our multi-task model outperforms its single-task counterpart (on average, by 24%) on the detection ofhomophobic/transphobic content. Moreover, the results achieved in detecting homophobic/transphobiccontent put our system in 1st or 2nd place for three out of four languages examined.

Place, publisher, year, edition, pages
CEUR-WS , 2022. Vol. 3395, p. 196-207
Series
CEUR Workshop Proceedings, ISSN 1613-0073
Keywords [en]
Multi-Task, Multi-Language Learning, Hateful Language, Sentiment Analysis, Detecting Homophobic/- Transphobic Language
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-98273Scopus ID: 2-s2.0-85160747864OAI: oai:DiVA.org:ltu-98273DiVA, id: diva2:1766531
Conference
14th Forum for Information Retrieval Evaluation, FIRE 2022, December 9-13, 2022, Kolkata, India
Funder
Vinnova, 2019-02996
Note

Licens fulltext: CC BY License

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2023-06-13Bibliographically approved

Open Access in DiVA

fulltext(1152 kB)104 downloads
File information
File name FULLTEXT01.pdfFile size 1152 kBChecksum SHA-512
3e3395b18e67edbca7beff9cced238f79651418acbc723bb9053d351839a829d94b53ddd7eae01853b2f802ea9bde5b749ea26fd3f9286ed04b28cb052cdd727
Type fulltextMimetype application/pdf

Other links

Scopushttps://ceur-ws.org/Vol-3395/

Authority records

Al-Azzawi, Sana Sabah SabryKovács, György

Search in DiVA

By author/editor
Al-Azzawi, Sana Sabah SabryKovács, György
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 104 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 233 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf