Change search
Link to record
Permanent link

Direct link
Al-Azzawi, Sana Sabah SabryORCID iD iconorcid.org/0000-0001-7924-4953
Alternative names
Publications (10 of 16) Show all publications
Rasheed, A. F., Zarkoosh, M., Abbas, S. F. & Al-Azzawi, S. S. (2025). TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks. In: Vijendra Singh; Kuan-Ching Li; Vijayan K. V K Asari; Rubén R.G González Crespo (Ed.), International Conference on Machine Learning and Data Engineering, ICMLDE 2024: . Paper presented at 3rd International conference on Machine Learning and Data Engineering (ICMLDE 2024), Dehradun, India, November 28-29, 2024 (pp. 3713-3722). Elsevier B.V.
Open this publication in new window or tab >>TaskComplexity: A Dataset for Task Complexity Classification with In-Context Learning, FLAN-T5 and GPT-4o Benchmarks
2025 (English)In: International Conference on Machine Learning and Data Engineering, ICMLDE 2024 / [ed] Vijendra Singh; Kuan-Ching Li; Vijayan K. V K Asari; Rubén R.G González Crespo, Elsevier B.V. , 2025, p. 3713-3722Conference paper, Published paper (Refereed)
Abstract [en]

This paper addresses the challenge of classifying and assigning programming tasks to experts, a process that typically requires significant effort, time, and cost. To tackle this issue, a novel dataset1 containing a total of 4,112 programming tasks was created by extracting tasks from various websites. Web scraping techniques were employed to collect this dataset of programming problems systematically. Specific HTML tags were tracked to extract key elements of each issue, including the title, problem description, input/output, examples, problem class, and complexity score. Examples from the dataset are provided in the appendix to illustrate the variety and complexity of tasks included. The dataset’s effectiveness has been evaluated and benchmarked using two approaches; the first approach involved fine-tuning the FLAN-T5 small model on the dataset, while the second approach used in-context learning (ICL) with the GPT-4o mini. The performance was assessed using standard metrics: accuracy, recall, precision, and F1-score. The results indicated that in-context learning with GPT-4o-mini outperformed the FLAN-T5 model. 

Place, publisher, year, edition, pages
Elsevier B.V., 2025
Series
Procedia Computer Science, ISSN 1877-0509 ; 258
Keywords
GPT-4o-mini, Flan-T5, task classification, in-context learning, Natural Language Processing (NLP), dataset creation
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-113963 (URN)10.1016/j.procs.2025.04.626 (DOI)2-s2.0-105007160276 (Scopus ID)
Conference
3rd International conference on Machine Learning and Data Engineering (ICMLDE 2024), Dehradun, India, November 28-29, 2024
Note

Full text license: CC BY-NC-ND

Available from: 2025-07-01 Created: 2025-07-01 Last updated: 2025-10-21Bibliographically approved
Wang, J., Adelani, D. I., Agrawal, S., Masiak, M., Rei, R., Briakou, E., . . . Stenetorp, P. (2024). AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages. In: Duh K.; Gomez H.; Bethard S. (Ed.), Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024: . Paper presented at 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024 (pp. 5997-6023). Association for Computational Linguistics (ACL), Article ID 200463.
Open this publication in new window or tab >>AfriMTE and AfriCOMET: Enhancing COMET to Embrace Under-resourced African Languages
Show others...
2024 (English)In: Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL 2024 / [ed] Duh K.; Gomez H.; Bethard S., Association for Computational Linguistics (ACL) , 2024, p. 5997-6023, article id 200463Conference paper, Published paper (Refereed)
Abstract [en]

Despite the recent progress on scaling multilingual machine translation (MT) to severalunder-resourced African languages, accuratelymeasuring this progress remains challenging,since evaluation is often performed on n-grammatching metrics such as BLEU, which typically show a weaker correlation with humanjudgments. Learned metrics such as COMEThave higher correlation; however, the lack ofevaluation data with human ratings for underresourced languages, complexity of annotationguidelines like Multidimensional Quality Metrics (MQM), and limited language coverageof multilingual encoders have hampered theirapplicability to African languages. In this paper, we address these challenges by creatinghigh-quality human evaluation data with simplified MQM guidelines for error detection and direct assessment (DA) scoring for 13 typologically diverse African languages. Furthermore, we develop AFRICOMET: COMETevaluation metrics for African languages byleveraging DA data from well-resourced languages and an African-centric multilingual encoder (AfroXLM-R) to create the state-of-theart MT evaluation metrics for African languages with respect to Spearman-rank correlation with human judgments (0.441).

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL), 2024
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-108639 (URN)10.18653/v1/2024.naacl-long.334 (DOI)2-s2.0-85199581086 (Scopus ID)
Conference
2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2024), Mexico City, Mexico, June 16-21, 2024
Note

Funder: UTTER (101070631); Portuguese Recovery and Resilience Plan (C645008882-00000055); Landmark Development Initiative Africa; European Commission; Fundação para a Ciência e a Tecnologia;

ISBN for host publication: 979-889176114-8; 

Fulltext license: CC BY Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License

Available from: 2024-08-20 Created: 2024-08-20 Last updated: 2025-10-21Bibliographically approved
Rasheed, A. F., Zarkoosh, M., Abbas, S. F. & Al-Azzawi, S. S. (2023). Arabic Offensive Language Classification: Leveraging Transformer, LSTM, and SVM. In: Manuel Cardona; Vijender K. Solanki (Ed.), Proceedings of the 2023 IEEE International Conference on Machine Learning and Applied Network Technologies, ICMLANT: . Paper presented at 2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), December 14-15, 2023, San Salvador, El Salvador (pp. 115-120). IEEE
Open this publication in new window or tab >>Arabic Offensive Language Classification: Leveraging Transformer, LSTM, and SVM
2023 (English)In: Proceedings of the 2023 IEEE International Conference on Machine Learning and Applied Network Technologies, ICMLANT / [ed] Manuel Cardona; Vijender K. Solanki, IEEE, 2023, p. 115-120Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2023
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103759 (URN)10.1109/ICMLANT59547.2023.10372866 (DOI)001161329500021 ()2-s2.0-85183473454 (Scopus ID)979-8-3503-0391-9 (ISBN)979-8-3503-0392-6 (ISBN)
Conference
2023 IEEE International Conference on Machine Learning and Applied Network Technologies (ICMLANT), December 14-15, 2023, San Salvador, El Salvador
Available from: 2024-01-16 Created: 2024-01-16 Last updated: 2025-10-21Bibliographically approved
Alkhaled, L., Adewumi, O. & Sabry, S. S. (2023). Bipol: A novel multi-axes bias evaluation metric with explainability for NLP. Natural Language Processing Journal, 4, Article ID 100030.
Open this publication in new window or tab >>Bipol: A novel multi-axes bias evaluation metric with explainability for NLP
2023 (English)In: Natural Language Processing Journal, ISSN 2949-7191, Vol. 4, article id 100030Article in journal (Refereed) Published
Abstract [en]

We introduce bipol, a new metric with explainability, for estimating social bias in text data. Harmful bias is prevalent in many online sources of data that are used for training machine learning (ML) models. In a step to address this challenge we create a novel metric that involves a two-step process: corpus-level evaluation based on model classification and sentence-level evaluation based on (sensitive) term frequency (TF). After creating new models to classify bias using SotA architectures, we evaluate two popular NLP datasets (COPA and SQuADv2) and the WinoBias dataset. As additional contribution, we created a large English dataset (with almost 2 million labeled samples) for training models in bias classification and make it publicly available. We also make public our codes.

Place, publisher, year, edition, pages
Elsevier, 2023
Keywords
Bipol, MAB dataset, NLP, Bias
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-102419 (URN)10.1016/j.nlp.2023.100030 (DOI)
Note

Godkänd;2023;Nivå 0;2023-11-13 (joosat);

CC BY 4.0 License

Available from: 2023-11-13 Created: 2023-11-13 Last updated: 2025-10-21Bibliographically approved
Adewumi, T., Södergren, I., Alkhaled, L., Sabry, S. S., Liwicki, F. & Liwicki, M. (2023). Bipol: Multi-axes Evaluation of Bias with Explainability in BenchmarkDatasets. In: Galia Angelova, Maria Kunilovskaya and Ruslan Mitkov (Ed.), Proceedings of Recent Advances in Natural Language Processing: . Paper presented at International Conference Recent Advances In Natural Language Processing (RANLP 2023), Varna, Bulgaria, September 4-6, 2023 (pp. 1-10). Incoma Ltd.
Open this publication in new window or tab >>Bipol: Multi-axes Evaluation of Bias with Explainability in BenchmarkDatasets
Show others...
2023 (English)In: Proceedings of Recent Advances in Natural Language Processing / [ed] Galia Angelova, Maria Kunilovskaya and Ruslan Mitkov, Incoma Ltd. , 2023, p. 1-10Conference paper, Published paper (Refereed)
Abstract [en]

We investigate five English NLP benchmark datasets (on the superGLUE leaderboard) and two Swedish datasets for bias, along multiple axes. The datasets are the following: Boolean Question (Boolq), CommitmentBank (CB), Winograd Schema Challenge (WSC), Winogender diagnostic (AXg), Recognising Textual Entailment (RTE), Swedish CB, and SWEDN. Bias can be harmful and it is known to be common in data, which ML models learn from. In order to mitigate bias in data, it is crucial to be able to estimate it objectively. We use bipol, a novel multi-axes bias metric with explainability, to estimate and explain how much bias exists in these datasets. Multilingual, multi-axes bias evaluation is not very common. Hence, we also contribute a new, large Swedish bias-labeled dataset (of 2 million samples), translated from the English version and train the SotA mT5 model on it. In addition, we contribute new multi-axes lexica for bias detection in Swedish. We make the codes, model, and new dataset publicly available.

Place, publisher, year, edition, pages
Incoma Ltd., 2023
Series
International conference Recent advances in natural language processing, E-ISSN 2603-2813 ; 2023
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103097 (URN)10.26615/978-954-452-092-2_001 (DOI)2-s2.0-85179181932 (Scopus ID)
Conference
International Conference Recent Advances In Natural Language Processing (RANLP 2023), Varna, Bulgaria, September 4-6, 2023
Note

ISBN for host publication: 978-954-452-092-2

Available from: 2023-11-30 Created: 2023-11-30 Last updated: 2025-10-21Bibliographically approved
Kovács, G., Al-Azzawi, S. S., Mokayed, H., Liwicki, F. & Liwicki, M. (2023). Experiences from implementing teach-back In the teaching of Artificial Intelligence. In: Pixel (Ed.), New perspectives in science education: 12th edition: . Paper presented at New Perspective in Science Education 2023, Florence, Italy [DIGITAL], March 16-17, 2023. Libreriauniversitaria.it, 12(12)
Open this publication in new window or tab >>Experiences from implementing teach-back In the teaching of Artificial Intelligence
Show others...
2023 (English)In: New perspectives in science education: 12th edition / [ed] Pixel, Libreriauniversitaria.it , 2023, Vol. 12, no 12Conference paper, Published paper (Refereed)
Abstract [en]

Teach back, initially used in health care, has recently been also applied in higher education, for selfassessment, and to enhance student engagement. These applications, however, have been limited to the field of humanities. In our work, we extend the concept of teach-back to one of our STEM courses,namely “Introduction to Artificial Intelligence”. Here, we couple teach-back with the use of study groups. Our goal with this exercise was many-fold. For one, we want to increase student engagementand student activation in the course. Moreover, we want to increase student understanding. Lastly, we want to attain the above with an efficient use of teacher hours. In this paper, we describe how we implemented the teach-back method. Our results show that after introducing teach-back, we have a slightly decreased rate of fails at the oral exam. We also found that students who engaged in the teach-back activity found it mostly beneficial and engaging. Unfortunately, however, it was only a small portion of students who participated in teach-back. Thus, beyond improving the implementation ofteach-back, our main target for future work will be to engage more students in it.

Place, publisher, year, edition, pages
Libreriauniversitaria.it, 2023
Series
International conference New perspectives in science education, ISSN 2420-9732
Keywords
student engagement, motivation, teach-back, Artificial Intelligence
National Category
Computer and Information Sciences Educational Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111938 (URN)2-s2.0-85216748264 (Scopus ID)
Conference
New Perspective in Science Education 2023, Florence, Italy [DIGITAL], March 16-17, 2023
Note

ISBN for host publication: 979-12-80225-55-9

Available from: 2025-03-10 Created: 2025-03-10 Last updated: 2025-10-21Bibliographically approved
Hosseini, P., Hosseini, M., Al-Azzawi, S. S., Liwicki, M., Castro, I. & Purver, M. (2023). Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction. In: Atul Kr. Ojha; A. Seza Dogruoz; Giovanni Da San Martino; Harish Tayyar Madabushi; Ritesh Kumar; Elisa Sartori (Ed.), 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop: . Paper presented at 17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023,Hybrid, Toronto,Canada,July 13-14,2023 (pp. 1329-1334). Association for Computational Linguistics
Open this publication in new window or tab >>Lon-eå at SemEval-2023 Task 11: A Comparison of Activation Functions for Soft and Hard Label Prediction
Show others...
2023 (English)In: 17th International Workshop on Semantic Evaluation, SemEval 2023 - Proceedings of the Workshop / [ed] Atul Kr. Ojha; A. Seza Dogruoz; Giovanni Da San Martino; Harish Tayyar Madabushi; Ritesh Kumar; Elisa Sartori, Association for Computational Linguistics , 2023, p. 1329-1334Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2023
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103558 (URN)10.18653/v1/2023.semeval-1.185 (DOI)001281001900184 ()2-s2.0-85160934844 (Scopus ID)
Conference
17th International Workshop on Semantic Evaluation, SemEval 2023, co-located with the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023,Hybrid, Toronto,Canada,July 13-14,2023
Note

Funder: UK EPSRC (EP/S022325/1 IGGI, EP/S033564/1 Sodestream, EP/W032473/1 AP4L, EP/V011189/1 REPHRAIN, EP/W001632/1 ARCIDUCA); Slovenian Research Agency (J5-3102, P2-0103);

ISBN for host publication: 978-1-959429-99-9

Available from: 2024-01-11 Created: 2024-01-11 Last updated: 2026-02-11Bibliographically approved
Azime, I. A., Al-Azzawi, S. S., Tonja, A. L., Shode, I., Alabi, J., Awokoya, A., . . . Yousuf, O. (2023). Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages. In: Atul Kr. Ojha; A. Seza Dogruoz; Giovanni Da San Martino; Harish Tayyar Madabushi; Ritesh Kumar; Elisa Sartori (Ed.), The 17th International Workshop on Semantic Evaluation (SemEval-2023): Proceedings of the Workshop. Paper presented at 17th International Workshop on Semantic Evaluation (SemEval-2023), Toronto, Canada, July 13-14, 2023 (pp. 1311-1316). Association for Computational Linguistics
Open this publication in new window or tab >>Masakhane-Afrisenti at SemEval-2023 Task 12: Sentiment Analysis using Afro-centric Language Models and Adapters for Low-resource African Languages
Show others...
2023 (English)In: The 17th International Workshop on Semantic Evaluation (SemEval-2023): Proceedings of the Workshop / [ed] Atul Kr. Ojha; A. Seza Dogruoz; Giovanni Da San Martino; Harish Tayyar Madabushi; Ritesh Kumar; Elisa Sartori, Association for Computational Linguistics , 2023, p. 1311-1316Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2023
National Category
Natural Language Processing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103460 (URN)10.18653/v1/2023.semeval-1.182 (DOI)001281001900181 ()2-s2.0-85175398880 (Scopus ID)
Conference
17th International Workshop on Semantic Evaluation (SemEval-2023), Toronto, Canada, July 13-14, 2023
Note

ISBN for host publication: 978-1-959429-99-9

Available from: 2024-01-03 Created: 2024-01-03 Last updated: 2026-02-11Bibliographically approved
Rasheed, A. F., Zarkoosh, M. & Al-Azzawi, S. S. (2023). Multi-CNN Voting Method for Improved Arabic Handwritten Digits Classification. In: 2023 9th International Conference on Computer and Communication Engineering (ICCCE): . Paper presented at 9th International Conference on Computer and Communication Engineering (ICCCE 2023), Kuala Lumpur, Malaysia, August 15-16, 2023 (pp. 205-210). Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>Multi-CNN Voting Method for Improved Arabic Handwritten Digits Classification
2023 (English)In: 2023 9th International Conference on Computer and Communication Engineering (ICCCE), Institute of Electrical and Electronics Engineers Inc. , 2023, p. 205-210Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
National Category
Computer Sciences Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103531 (URN)10.1109/ICCCE58854.2023.10246065 (DOI)2-s2.0-85173656322 (Scopus ID)
Conference
9th International Conference on Computer and Communication Engineering (ICCCE 2023), Kuala Lumpur, Malaysia, August 15-16, 2023
Note

ISBN for host publication: 979-8-3503-2521-8

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2025-10-21Bibliographically approved
Adewumi, O., Sabry, S. S., Abid, N., Liwicki, F. & Liwicki, M. (2023). T5 for Hate Speech, Augmented Data, and Ensemble. Sci, 5(4), Article ID 37.
Open this publication in new window or tab >>T5 for Hate Speech, Augmented Data, and Ensemble
Show others...
2023 (English)In: Sci, E-ISSN 2413-4155, Vol. 5, no 4, article id 37Article in journal (Refereed) Published
Abstract [en]

We conduct relatively extensive investigations of automatic hate speech (HS) detection using different State-of-The-Art (SoTA) baselines across 11 subtasks spanning six different datasets. Our motivation is to determine which of the recent SoTA models is best for automatic hate speech detection and what advantage methods, such as data augmentation and ensemble, may have on the best model, if any. We carry out six cross-task investigations. We achieve new SoTA results on two subtasks—macro F1 scores of 91.73% and 53.21% for subtasks A and B of the HASOC 2020 dataset, surpassing previous SoTA scores of 51.52% and 26.52%, respectively. We achieve near-SoTA results on two others—macro F1 scores of 81.66% for subtask A of the OLID 2019 and 82.54% for subtask A of the HASOC 2021, in comparison to SoTA results of 82.9% and 83.05%, respectively. We perform error analysis and use two eXplainable Artificial Intelligence (XAI) algorithms (Integrated Gradient (IG) and SHapley Additive exPlanations (SHAP)) to reveal how two of the models (Bi-Directional Long Short-Term Memory Network (Bi-LSTM) and Text-to-Text-Transfer Transformer (T5)) make the predictions they do by using examples. Other contributions of this work are: (1) the introduction of a simple, novel mechanism for correcting Out-of-Class (OoC) predictions in T5, (2) a detailed description of the data augmentation methods, and (3) the revelation of the poor data annotations in the HASOC 2021 dataset by using several examples and XAI (buttressing the need for better quality control). We publicly release our model checkpoints and codes to foster transparency.

Place, publisher, year, edition, pages
MDPI, 2023
Keywords
hate speech, NLP, T5, LSTM, RoBERTa
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-102417 (URN)10.3390/sci5040037 (DOI)001543619400001 ()2-s2.0-85180673806 (Scopus ID)
Note

Godkänd;2023;Nivå 0;2023-11-13 (joosat);

Part of special issue: Computational Linguistics and Artificial Intelligence

CC BY 4.0 License

Available from: 2023-11-13 Created: 2023-11-13 Last updated: 2025-11-28Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7924-4953

Search in DiVA

Show all publications