Open this publication in new window or tab >>Unbabel; Instituto Superior Técnico; INESC-ID.
University of Maryland, USA.
University of Maryland, USA.
University College London, UK.
ENSIAS, Morocco.
SADiLaR, South Africa.
Aston University, UK.
University of Eastern Finland, Finland.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
Fudan University, China.
Masakhane NLP.
Conservatoire National des Arts et Métiers, France.
Masakhane NLP.
Masakhane NLP; Lelapa AI, South Africa.
Masakhane NLP; Imperial College London, UK; HausaNLP.
Masakhane NLP; University of Deusto, Spain.
Masakhane NLP; University of California, USA.
Masakhane NLP; Lancaster University, UK.
Masakhane NLP.
Masakhane NLP.
Mohammed V University, Morocco.
Masakhane NLP.
Masakhane NLP.
Jamhuriya University Of Science and Technology, Somalia.
LAUTECH, Nigeria.
The College of Saint Rose, USA.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Signals and Systems.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
University of Minnesota -Twin Cities, USA.
Microsoft Africa Research Institute.
University of Amsterdam, Netherlands.
The Technical University of Kenya.
Masakhane NLP.
AIMS, Cameroon.
KU Leuven, Belgium.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
HausaNLP.
SIAT-CAS, China; Kaduna State University, Nigeria.
University of Cape Coast, Ghana; Ghana NLP.
Masakhane NLP; Kwame Nkrumah University of Science and Technology, Ghana.
Masakhane NLP; New Mexico State University, USA.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
USIU-Africa.
UNIZIK, Nigeria.
AIMS, Senegal.
University College London, UK.
University College London, UK.
Show others...
2024 (English)In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 / [ed] Duh K.; Gomez H.; Bethard S., Association for Computational Linguistics (ACL) , 2024, p. 5997-6023, article id 200463Conference paper, Published paper (Refereed)
Abstract [en]
Despite the recent progress on scaling multilingual machine translation (MT) to severalunder-resourced African languages, accuratelymeasuring this progress remains challenging,since evaluation is often performed on n-grammatching metrics such as BLEU, which typically show a weaker correlation with humanjudgments. Learned metrics such as COMEThave higher correlation; however, the lack ofevaluation data with human ratings for underresourced languages, complexity of annotationguidelines like Multidimensional Quality Metrics (MQM), and limited language coverageof multilingual encoders have hampered theirapplicability to African languages. In this paper, we address these challenges by creatinghigh-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMETevaluation metrics for African languages byleveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-theart MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL), 2024
National Category
Language Technology (Computational Linguistics)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-108639 (URN)10.18653/v1/2024.naacl-long.334 (DOI)2-s2.0-85199581086 (Scopus ID)
Conference
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024
Note
Funder: UTTER (101070631); Portuguese Recovery and Resilience Plan (C645008882-00000055); Landmark Development Initiative Africa; European Commission; Fundação para a Ciência e a Tecnologia;
ISBN for host publication: 979-889176114-8;
Fulltext license: CC BY Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License
2024-08-202024-08-202024-11-27Bibliographically approved