The GEM Benchmark: Natural Language Generation, its Evaluation and MetricsMasakhane, Africa; University of Lagos, Nigeria.
Stanford University, USA.
Carnegie Mellon University, USA.
Edinburgh Centre for Robotics, UK; Heriot-Watt University, UK; University of Edinburgh.
Google Research.
Amelia R&D, New York, USA.
University of Virginia, USA.
Cornell University, USA.
Charles University, Prauge, Czech Republic.
Makahane, Africa; Technical University of Munich, Munich, Germany.
Carnegie Mellon University, USA.
University of Michigan Ann Arbor, USA.
Stanford University, USA.
IBM Research.
Hugging Face.
Carnegie Mellon University, USA.
University of Virginia, USA.
DFKI, Germany; Technical University of Kaiserslautern, Germany.
Google Research.
University of Waterloo, Canada.
Columbia University, USA.
Carnegie Mellon University, USA.
Georgia Tech, USA.
University of North Carolina, Charlotte, USA.
Trivago.
University of California San Diego, USA.
Instituto de Telecomunicações, Portugal.
University of Washington, USA.
Pompeu Fabra University, Spain.
Tilburg University, Netherlands.
Massachusetts Institute oof Technology, USA.
Google Research.
Google Research.
Masakhane, Africa; University of Electronic Science and Technology of China, China.
Kwame Nkrumah University of Science and Technology, Ghana; Masakhane, Africa.
Google Research.
University of Edinburgh, UK.
National Institute of Technology Karnataka, India.
Microsoft.
University of Texas at Austin, USA.
University of North Carolina, Charlotte, USA.
New York University, USA.
Google Research.
University of North Carolina, Charlotte, USA.
Université de Lorraine, France.
University São Paulo, Brazil.
IBM Research.
Intelligent Systems Lab, Intel; Masakhane, Africa.
Georgia Tech, USA.
Georgia Tech, USA.
Samsung Research.
Harvard University, USA.
Show others and affiliations
2021 (English)In: The 1st Workshop on Natural Language Generation, Evaluation, and Metrics: Proceedings of the Workshop, Association for Computational Linguistics, 2021, p. 96-120, article id 2021.gem-1.10Conference paper, Published paper (Refereed)
Abstract [en]
We introduce GEM, a living benchmark for natural language Generation (NLG), its Evaluation, and Metrics. Measuring progress in NLG relies on a constantly evolving ecosystem of automated metrics, datasets, and human evaluation standards. Due to this moving target, new models often still evaluate on divergent anglo-centric corpora with well-established, but flawed, metrics. This disconnect makes it challenging to identify the limitations of current models and opportunities for progress. Addressing this limitation, GEM provides an environment in which models can easily be applied to a wide set of tasks and in which evaluation strategies can be tested. Regular updates to the benchmark will help NLG research become more multilingual and evolve the challenge alongside models. This paper serves as the description of the data for the 2021 shared task at the associated GEM Workshop.
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2021. p. 96-120, article id 2021.gem-1.10
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-87438DOI: 10.18653/v1/2021.gem-1.10ISI: 000697564200010Scopus ID: 2-s2.0-85121350997OAI: oai:DiVA.org:ltu-87438DiVA, id: diva2:1601443
Conference
1st Workshop on Natural Language Generation, Evaluation, and Metrics (GEM 2021), Bangkok, Thailand (online), August 5-6, 2021
Note
ISBN för värdpublikation: 978-1-954085-67-1
2021-10-082021-10-082025-02-07Bibliographically approved