AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African LanguagesShow others and affiliations
2024 (English)In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 / [ed] Duh K.; Gomez H.; Bethard S., Association for Computational Linguistics (ACL) , 2024, p. 5997-6023, article id 200463Conference paper, Published paper (Refereed)
Abstract [en]
Despite the recent progress on scaling multilingual machine translation (MT) to severalunder-resourced African languages, accuratelymeasuring this progress remains challenging,since evaluation is often performed on n-grammatching metrics such as BLEU, which typically show a weaker correlation with humanjudgments. Learned metrics such as COMEThave higher correlation; however, the lack ofevaluation data with human ratings for underresourced languages, complexity of annotationguidelines like Multidimensional Quality Metrics (MQM), and limited language coverageof multilingual encoders have hampered theirapplicability to African languages. In this paper, we address these challenges by creatinghigh-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMETevaluation metrics for African languages byleveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-theart MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).
Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL) , 2024. p. 5997-6023, article id 200463
National Category
Language Technology (Computational Linguistics)
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-108639DOI: 10.18653/v1/2024.naacl-long.334Scopus ID: 2-s2.0-85199581086OAI: oai:DiVA.org:ltu-108639DiVA, id: diva2:1890571
Conference
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024
Note
Funder: UTTER (101070631); Portuguese Recovery and Resilience Plan (C645008882-00000055); Landmark Development Initiative Africa; European Commission; Fundação para a Ciência e a Tecnologia;
ISBN for host publication: 979-889176114-8;
Fulltext license: CC BY Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License
2024-08-202024-08-202024-11-27Bibliographically approved