Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
Link to record
Permanent link

Direct link
Kovács, György, Postdoctoral researcherORCID iD iconorcid.org/0000-0002-0546-116X
Publications (10 of 28) Show all publications
Pirinen, A., Abid, N., Paszkowsky, N. A., Ohlson Timoudas, T., Scheirer, R., Ceccobello, C., . . . Persson, A. (2024). Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI. Remote Sensing, 16(4), Article ID 694.
Open this publication in new window or tab >>Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI
Show others...
2024 (English)In: Remote Sensing, E-ISSN 2072-4292, Vol. 16, no 4, article id 694Article in journal (Refereed) Published
Abstract [en]

Cloud formations often obscure optical satellite-based monitoring of the Earth’s surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance for a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which are often difficult to come by in EO contexts. This is especially true when it comes to cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is common practice. To alleviate the COT data scarcity problem, in this work, we propose a novel synthetic dataset for COT estimation, which we subsequently leverage for obtaining reliable and versatile cloud masks on real data. In our dataset, top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multispectral Imagery (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. In particular, by thresholding COT estimates from our ML models, we show on two satellite image datasets (one that is publicly available, and one which we have collected and annotated) that reliable cloud masks can be obtained. The synthetic data, the newly collected real dataset, code and models have been made publicly available.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
cloud detection, cloud optical thickness, datasets, machine learning
National Category
Remote Sensing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-104597 (URN)10.3390/rs16040694 (DOI)001177031000001 ()2-s2.0-85185890836 (Scopus ID)
Funder
Vinnova, 2021-03643; 2023-02787
Note

Validerad;2024;Nivå 2;2024-04-09 (sofila);

Full text license: CC BY

Available from: 2024-03-14 Created: 2024-03-14 Last updated: 2024-11-20Bibliographically approved
Abid, N., Noman, M. K., Kovács, G., Islam, S. M., Adewumi, T., Lavery, P., . . . Liwicki, M. (2024). Seagrass classification using unsupervised curriculum learning (UCL). Ecological Informatics, 83, Article ID 102804.
Open this publication in new window or tab >>Seagrass classification using unsupervised curriculum learning (UCL)
Show others...
2024 (English)In: Ecological Informatics, ISSN 1574-9541, E-ISSN 1878-0512, Vol. 83, article id 102804Article in journal (Refereed) Published
Abstract [en]

Seagrass ecosystems are pivotal in marine environments, serving as crucial habitats for diverse marine species and contributing significantly to carbon sequestration. Accurate classification of seagrass species from underwater images is imperative for monitoring and preserving these ecosystems. This paper introduces Unsupervised Curriculum Learning (UCL) to seagrass classification using the DeepSeagrass dataset. UCL progressively learns from simpler to more complex examples, enhancing the model's ability to discern seagrass features in a curriculum-driven manner. Experiments employing state-of-the-art deep learning architectures, convolutional neural networks (CNNs), show that UCL achieved overall 90.12 % precision and 89 % recall, which significantly improves classification accuracy and robustness, outperforming some traditional supervised learning approaches like SimCLR, and unsupervised approaches like Zero-shot CLIP. The methodology of UCL involves four main steps: high-dimensional feature extraction, pseudo-label generation through clustering, reliable sample selection, and fine-tuning the model. The iterative UCL framework refines CNN's learning of underwater images, demonstrating superior accuracy, generalization, and adaptability to unseen seagrass and background samples of undersea images. The findings presented in this paper contribute to the advancement of seagrass classification techniques, providing valuable insights into the conservation and management of marine ecosystems. The code and dataset are made publicly available and can be assessed here: https://github.com/nabid69/Unsupervised-Curriculum-Learning—UCL.

 

Place, publisher, year, edition, pages
Elsevier B.V., 2024
Keywords
Seagrass, Deep learning, Unsupervised classification, Curriculum learning, Unsupervised curriculum learning, Underwater digital imaging
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-109778 (URN)10.1016/j.ecoinf.2024.102804 (DOI)001307982900001 ()2-s2.0-85202895926 (Scopus ID)
Note

Validerad;2024;Nivå 2;2024-09-09 (hanlid);

Full text license: CC BY

Available from: 2024-09-09 Created: 2024-09-09 Last updated: 2024-11-20Bibliographically approved
Rakesh, S., Liwicki, F., Mokayed, H., Upadhyay, R., Chhipa, P. C., Gupta, V., . . . Saini, R. (2023). Emotions Classification Using EEG in Health Care. In: Tistarelli, Massimo; Dubey, Shiv Ram; Singh, Satish Kumar; Jiang, Xiaoyi (Ed.), Computer Vision and Machine Intelligence: Proceedings of CVMI 2022. Paper presented at International Conference on Computer Vision & Machine Intelligence (CVMI), Allahabad, Prayagraj, India, August 12-13, 2022 (pp. 37-49). Springer Nature
Open this publication in new window or tab >>Emotions Classification Using EEG in Health Care
Show others...
2023 (English)In: Computer Vision and Machine Intelligence: Proceedings of CVMI 2022 / [ed] Tistarelli, Massimo; Dubey, Shiv Ram; Singh, Satish Kumar; Jiang, Xiaoyi, Springer Nature, 2023, p. 37-49Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Networks and Systems (LNNS) ; 586
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-98587 (URN)10.1007/978-981-19-7867-8_4 (DOI)2-s2.0-85161601282 (Scopus ID)
Conference
International Conference on Computer Vision & Machine Intelligence (CVMI), Allahabad, Prayagraj, India, August 12-13, 2022
Note

ISBN för värdpublikation: 978-981-19-7866-1, 978-981-19-7867-8

Available from: 2023-06-19 Created: 2023-06-19 Last updated: 2023-09-05Bibliographically approved
Al-Azzawi, S., Kovács, G., Nilsson, F., Adewumi, T. & Liwicki, M. (2023). NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset. In: 17th International Workshop on Semantic Evaluation, SemEval 2023: Proceedings of the Workshop. Paper presented at 17th International Workshop on Semantic Evaluation, Toronto, Canada, July 13-14, 2023 (pp. 1421-1427). Association for Computational Linguistics
Open this publication in new window or tab >>NLP-LTU at SemEval-2023 Task 10: The Impact of Data Augmentation and Semi-Supervised Learning Techniques on Text Classification Performance on an Imbalanced Dataset
Show others...
2023 (English)In: 17th International Workshop on Semantic Evaluation, SemEval 2023: Proceedings of the Workshop, Association for Computational Linguistics, 2023, p. 1421-1427Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2023
National Category
Language Technology (Computational Linguistics) Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-101374 (URN)10.18653/v1/2023.semeval-1.196 (DOI)2-s2.0-85175399160 (Scopus ID)
Conference
17th International Workshop on Semantic Evaluation, Toronto, Canada, July 13-14, 2023
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-1-959429-99-9

Available from: 2023-09-18 Created: 2023-09-18 Last updated: 2024-01-03Bibliographically approved
Nilsson, F. & Kovács, G. (2022). FilipN@LT-EDI-ACL2022-Detecting signs of Depression from Social Media: Examining the use of summarization methods as data augmentation for text classification. In: Bharathi Raja Chakravarthi, B Bharathi, John P McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar (Ed.), Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion: . Paper presented at Second Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2022), May 27, 2022, Dublin, Ireland (pp. 283-286). Association for Computational Linguistics
Open this publication in new window or tab >>FilipN@LT-EDI-ACL2022-Detecting signs of Depression from Social Media: Examining the use of summarization methods as data augmentation for text classification
2022 (English)In: Proceedings of the Second Workshop on Language Technology for Equality, Diversity and Inclusion / [ed] Bharathi Raja Chakravarthi, B Bharathi, John P McCrae, Manel Zarrouk, Kalika Bali, Paul Buitelaar, Association for Computational Linguistics , 2022, p. 283-286Conference paper, Published paper (Refereed)
Abstract [en]

Depression is a common mental disorder that severely affects the quality of life, and can lead to suicide. When diagnosed in time, mild, moderate, and even severe depression can be treated. This is why it is vital to detect signs of depression in time. One possibility for this is the use of text classification models on social media posts. Transformers have achieved state-of-the-art performance on a variety of similar text classification tasks. One drawback, however, is that when the dataset is imbalanced, the performance of these models may be negatively affected. Because of this, in this paper, we examine the effect of balancing a depression detection dataset using data augmentation. In particular, we use abstractive summarization techniques for data augmentation. We examine the effect of this method on the LT-EDI-ACL2022 task. Our results show that when increasing the multiplicity of the minority classes to the right degree, this data augmentation method can in fact improve classification scores on the task.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2022
Series
2022.ltedi-1
National Category
Computer Sciences Information Systems, Social aspects
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-90881 (URN)000847166600041 ()2-s2.0-85137459193 (Scopus ID)
Conference
Second Workshop on Language Technology for Equality, Diversity, Inclusion (LT-EDI-2022), May 27, 2022, Dublin, Ireland
Available from: 2022-06-02 Created: 2022-06-02 Last updated: 2022-09-21Bibliographically approved
Sabry, S. S., Adewumi, T., Abid, N., Kovács, G., Liwicki, F. & Liwicki, M. (2022). HaT5: Hate Language Identification using Text-to-Text Transfer Transformer. In: 2022 International Joint Conference on Neural Networks (IJCNN): Conference Proceedings: . Paper presented at IEEE World Congress on Computational Intelligence (IEEE WCCI 2022), Padua, Italy, July 18-23, 2022. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>HaT5: Hate Language Identification using Text-to-Text Transfer Transformer
Show others...
2022 (English)In: 2022 International Joint Conference on Neural Networks (IJCNN): Conference Proceedings, Institute of Electrical and Electronics Engineers (IEEE), 2022Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the performance of a state-of-the-art (SoTA) architecture T5 (available on the SuperGLUE) and compare it with 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using a new autoregressive conversational AI model checkpoint. We achieve near-SoTA results on a couple of the tasks - macro F1 scores of 81.66% for task A of the OLID 2019 dataset and 82.54% for task A of the hate speech and offensive content (HASOC) 2021 dataset, where SoTA are 82.9% and 83.05%, respectively. We perform error analysis and explain why one of the models (Bi-LSTM) makes the predictions it does by using a publicly available algorithm: Integrated Gradient (IG). This is because explainable artificial intelligence (XAI) is essential for earning the trust of users. The main contributions of this work are the implementation method of T5, which is discussed; the data augmentation, which brought performance improvements; and the revelation on the shortcomings of the HASOC 2021 dataset. The revelation shows the difficulties of poor data annotation by using a small set of examples where the T5 model made the correct predictions, even when the ground truth of the test set were incorrect (in our opinion). We also provide our model checkpoints on the HuggingFace hub1. https://huggingface.co/sana-ngu/HaT5_augmentation https://huggingface.co/sana-ngu/HaT5.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Hate Speech, Data Augmentation, Transformer, T5
National Category
Language Technology (Computational Linguistics)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-93432 (URN)10.1109/IJCNN55064.2022.9892696 (DOI)000867070906060 ()2-s2.0-85140754070 (Scopus ID)
Conference
IEEE World Congress on Computational Intelligence (IEEE WCCI 2022), Padua, Italy, July 18-23, 2022
Note

ISBN för värdpublikation: 978-1-7281-8671-9

Available from: 2022-10-04 Created: 2022-10-04 Last updated: 2023-09-05Bibliographically approved
Al-Azzawi, S. S., Kovács, G., Mokayed, H., Chronéer, D., Liwicki, F. & Liwicki, M. (2022). Innovative Education Approach Toward Active Distance Education: a Case Study in the Introduction to AI course. In: Conference Proceedings. The Future of Education 2022: . Paper presented at 12th Edition of the International Conference The Future of Education,Florence, Italy (Hybrid), June 30-July 1, 2022.
Open this publication in new window or tab >>Innovative Education Approach Toward Active Distance Education: a Case Study in the Introduction to AI course
Show others...
2022 (English)In: Conference Proceedings. The Future of Education 2022, 2022Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we first describe various synchronous and asynchronous methods for enhancing student engagement in big online courses. We showcase the implementation of these methods in the “Introduction to Artificial Intelligence (AI)” course at Luleå University of Technology, which has attracted around 500 students in each of its iterations (twice yearly, since 2019). We also show that these methods can be applied efficiently, in terms of the teaching hours required. With the increase in digitization and student mobility, the demand for improved and personalized content delivery for distance education has also increased. This applies not only in the context of traditional undergraduate education, but also in the context of adult education and lifelong learning. This higher level of demand, however, introduces a challenge, especially as it is typically combined with a shortage of staff and needs for efficient education. This challenge is further amplified by the current pandemic situation, which led to an even bigger risk of student-dropout. To mitigate this risk, as well as to meet the increased demand, we applied various methods for creating engaging interaction in our pedagogy based on Moor’s framework: learner-to-learner, learner-to-instructor, and learner-to-content engagement strategies. The main methods of this pedagogy are as follows: short, and interactive videos, active discussions in topic-based forums, regular live sessions with group discussions, and the introduction of optional content at many points in the course, to address different target groups. In this paper, we show how we originally designed and continuously improved the course, without requiring more than 500 teaching hours per iteration (one hour per enrolled student), while we also managed to increase the successful completion rate of the participants by 10%, and improved student engagement and feedback for the course by 50%. We intend to share a set of best-practices applicable to many other e-learning courses in ICT.

Keywords
Distance Education
National Category
Pedagogy Human Computer Interaction
Research subject
Machine Learning; Information systems
Identifiers
urn:nbn:se:ltu:diva-92211 (URN)
Conference
12th Edition of the International Conference The Future of Education,Florence, Italy (Hybrid), June 30-July 1, 2022
Note

ISSN for host publication: 2384-9509

Available from: 2022-07-21 Created: 2022-07-21 Last updated: 2023-09-05Bibliographically approved
Kovács, G., Alonso, P., Saini, R. & Liwicki, M. (2022). Leveraging external resources for offensive content detection in social media. AI Communications, 35(2), 87-109
Open this publication in new window or tab >>Leveraging external resources for offensive content detection in social media
2022 (English)In: AI Communications, ISSN 0921-7126, E-ISSN 1875-8452, Vol. 35, no 2, p. 87-109Article in journal (Refereed) Published
Abstract [en]

Hate speech is a burning issue of today’s society that cuts across numerous strategic areas, including human rights protection, refugee protection, and the fight against racism and discrimination. The gravity of the subject is further demonstrated by António Guterres, the United Nations Secretary-General, calling it “a menace to democratic values, social stability, and peace”. One central platform for the spread of hate speech is the Internet and social media in particular. Thus, automatic detection of hateful and offensive content on these platforms is a crucial challenge that would strongly contribute to an equal and sustainable society when overcome. One significant difficulty in meeting this challenge is collecting sufficient labeled data. In our work, we examine how various resources can be leveraged to circumvent this difficulty. We carry out extensive experiments to exploit various data sources using different machine learning models, including state-of-the-art transformers. We have found that using our proposed methods, one can attain state-of-the-art performance detecting hate speech on Twitter (outperforming the winner of both the HASOC 2019 and HASOC 2020 competitions). It is observed that in general, adding more data improves the performance or does not decrease it. Even when using good language models and knowledge transfer mechanisms, the best results were attained using data from one or two additional data sets.

Place, publisher, year, edition, pages
IOS Press, 2022
Keywords
Hateful and offensive language, deep language processing, transfer learning, vocabulary augmentation, RoBERTa
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-90607 (URN)10.3233/aic-210138 (DOI)000828016100004 ()2-s2.0-85135231173 (Scopus ID)
Note

Validerad;2022;Nivå 2;2022-07-20 (sofila)

Available from: 2022-05-11 Created: 2022-05-11 Last updated: 2023-09-05Bibliographically approved
Nilsson, F., Al-Azzawi, S. S. & Kovács, G. (2022). Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers. In: Kripabandhu Ghosh, Thomas Mandl, Prasenjit Majumder, Mandar Mitra (Ed.), FIRE 2022 Working Notes: . Paper presented at 14th Forum for Information Retrieval Evaluation, FIRE 2022, December 9-13, 2022, Kolkata, India (pp. 196-207). CEUR-WS, 3395
Open this publication in new window or tab >>Leveraging Sentiment Data for the Detection of Homophobic/Transphobic Content in a Multi-Task, Multi-Lingual Setting Using Transformers
2022 (English)In: FIRE 2022 Working Notes / [ed] Kripabandhu Ghosh, Thomas Mandl, Prasenjit Majumder, Mandar Mitra, CEUR-WS , 2022, Vol. 3395, p. 196-207Conference paper, Published paper (Refereed)
Abstract [en]

Hateful content is published and spread on social media at an increasing rate, harming the user experience.In addition, hateful content targeting particular, marginalized/vulnerable groups (e.g. homophobic/trans-phobic content) can cause even more harm to members of said groups. Hence, detecting hateful contentis crucial, regardless of its origin, or the language used. The large variety of (often underresourced)languages used, however, makes this task daunting, especially as many users use code-mixing in theirmessages. To help overcome these difficulties, the approach we present here uses a multi-languageframework. And to further mitigate the scarcity of labelled data, it also leverages data from the relatedtask of sentiment-analysis to improve the detection of homophobic/transphobic content. We evaluatedour system by participating in a sentiment analysis and hate speech detection challenge. Results showthat our multi-task model outperforms its single-task counterpart (on average, by 24%) on the detection ofhomophobic/transphobic content. Moreover, the results achieved in detecting homophobic/transphobiccontent put our system in 1st or 2nd place for three out of four languages examined.

Place, publisher, year, edition, pages
CEUR-WS, 2022
Series
CEUR Workshop Proceedings, ISSN 1613-0073
Keywords
Multi-Task, Multi-Language Learning, Hateful Language, Sentiment Analysis, Detecting Homophobic/- Transphobic Language
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-98273 (URN)2-s2.0-85160747864 (Scopus ID)
Conference
14th Forum for Information Retrieval Evaluation, FIRE 2022, December 9-13, 2022, Kolkata, India
Funder
Vinnova, 2019-02996
Note

Licens fulltext: CC BY License

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2023-06-13Bibliographically approved
Kenyeres, A. Z. & Kovács, G. (2022). Twitter bot detection using deep learning. In: Berend Gábor; Gosztolya Gábor; Vincze Veronika (Ed.), XVIII. Magyar Számítógépes Nyelvészeti Konferencia: . Paper presented at XVIII. Conference on Hungarian Computational Linguistic (MSZNY 2022), Szeged, january 27–28, 2022 (pp. 257-269). Szeged: University of Szeged
Open this publication in new window or tab >>Twitter bot detection using deep learning
2022 (English)In: XVIII. Magyar Számítógépes Nyelvészeti Konferencia / [ed] Berend Gábor; Gosztolya Gábor; Vincze Veronika, Szeged: University of Szeged , 2022, p. 257-269Conference paper, Published paper (Refereed)
Abstract [en]

Social media platforms have revolutionized how people interact with each other and how people gain information. However, social media platforms such as Twitter and Facebook quickly became the platform for public manipulation and spreading or amplifying political or ideological misinformation. Although malicious content can be shared by individuals, today millions of individual and coordinated automated accounts exist, also called bots which share hate, spread misinformation and manipulate public opinion without any human intervention. The work presented in this paper aims at designing and implementing deep learning approaches that successfully identify social media bots. Moreover we show that deep learning models can yield an accuracy of 0.9 on the PAN 2019 Bots and Gender Profiling dataset. In addition, the findings of this work also show that pre-trained models will be able to improve the accuracy of deep learning models and compete with Classical Machine Learning methods even on limited dataset.

Place, publisher, year, edition, pages
Szeged: University of Szeged, 2022
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-90184 (URN)
Conference
XVIII. Conference on Hungarian Computational Linguistic (MSZNY 2022), Szeged, january 27–28, 2022
Note

ISBN för värdpublikation: 978-963-306-848-9

Available from: 2022-04-13 Created: 2022-04-13 Last updated: 2022-04-13Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-0546-116X

Search in DiVA

Show all publications