Öppna denna publikation i ny flik eller fönster >>Unbabel; Instituto Superior Técnico; INESC-ID.
University of Maryland, USA.
University of Maryland, USA.
University College London, UK.
ENSIAS, Morocco.
SADiLaR, South Africa.
Aston University, UK.
University of Eastern Finland, Finland.
Luleå tekniska universitet, Institutionen för system- och rymdteknik, EISLAB.
Luleå tekniska universitet, Institutionen för system- och rymdteknik, EISLAB.
Fudan University, China.
Masakhane NLP.
Conservatoire National des Arts et Métiers, France.
Masakhane NLP.
Masakhane NLP; Lelapa AI, South Africa.
Masakhane NLP; Imperial College London, UK; HausaNLP.
Masakhane NLP; University of Deusto, Spain.
Masakhane NLP; University of California, USA.
Masakhane NLP; Lancaster University, UK.
Masakhane NLP.
Masakhane NLP.
Mohammed V University, Morocco.
Masakhane NLP.
Masakhane NLP.
Jamhuriya University Of Science and Technology, Somalia.
LAUTECH, Nigeria.
The College of Saint Rose, USA.
Luleå tekniska universitet, Institutionen för system- och rymdteknik, Signaler och system.
Luleå tekniska universitet, Institutionen för system- och rymdteknik, EISLAB.
University of Minnesota -Twin Cities, USA.
Microsoft Africa Research Institute.
University of Amsterdam, Netherlands.
The Technical University of Kenya.
Masakhane NLP.
AIMS, Cameroon.
KU Leuven, Belgium.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
HausaNLP.
SIAT-CAS, China; Kaduna State University, Nigeria.
University of Cape Coast, Ghana; Ghana NLP.
Masakhane NLP; Kwame Nkrumah University of Science and Technology, Ghana.
Masakhane NLP; New Mexico State University, USA.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
Masakhane NLP.
USIU-Africa.
UNIZIK, Nigeria.
AIMS, Senegal.
University College London, UK.
University College London, UK.
Visa övriga...
2024 (Engelska)Ingår i: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 / [ed] Duh K.; Gomez H.; Bethard S., Association for Computational Linguistics (ACL) , 2024, s. 5997-6023, artikel-id 200463Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]
Despite the recent progress on scaling multilingual machine translation (MT) to severalunder-resourced African languages, accuratelymeasuring this progress remains challenging,since evaluation is often performed on n-grammatching metrics such as BLEU, which typically show a weaker correlation with humanjudgments. Learned metrics such as COMEThave higher correlation; however, the lack ofevaluation data with human ratings for underresourced languages, complexity of annotationguidelines like Multidimensional Quality Metrics (MQM), and limited language coverageof multilingual encoders have hampered theirapplicability to African languages. In this paper, we address these challenges by creatinghigh-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMETevaluation metrics for African languages byleveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-theart MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
Ort, förlag, år, upplaga, sidor
Association for Computational Linguistics (ACL), 2024
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-108639 (URN)10.18653/v1/2024.naacl-long.334 (DOI)2-s2.0-85199581086 (Scopus ID)
Konferens
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024
Anmärkning
Funder: UTTER (101070631); Portuguese Recovery and Resilience Plan (C645008882-00000055); Landmark Development Initiative Africa; European Commission; Fundação para a Ciência e a Tecnologia;
ISBN for host publication: 979-889176114-8;
Fulltext license: CC BY Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License
2024-08-202024-08-202024-11-27Bibliografiskt granskad