This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole detection. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores with significance test having p−value<0.00005 on the utility pole dataset.
Seagrass ecosystems are pivotal in marine environments, serving as crucial habitats for diverse marine species and contributing significantly to carbon sequestration. Accurate classification of seagrass species from underwater images is imperative for monitoring and preserving these ecosystems. This paper introduces Unsupervised Curriculum Learning (UCL) to seagrass classification using the DeepSeagrass dataset. UCL progressively learns from simpler to more complex examples, enhancing the model's ability to discern seagrass features in a curriculum-driven manner. Experiments employing state-of-the-art deep learning architectures, convolutional neural networks (CNNs), show that UCL achieved overall 90.12 % precision and 89 % recall, which significantly improves classification accuracy and robustness, outperforming some traditional supervised learning approaches like SimCLR, and unsupervised approaches like Zero-shot CLIP. The methodology of UCL involves four main steps: high-dimensional feature extraction, pseudo-label generation through clustering, reliable sample selection, and fine-tuning the model. The iterative UCL framework refines CNN's learning of underwater images, demonstrating superior accuracy, generalization, and adaptability to unseen seagrass and background samples of undersea images. The findings presented in this paper contribute to the advancement of seagrass classification techniques, providing valuable insights into the conservation and management of marine ecosystems. The code and dataset are made publicly available and can be assessed here: https://github.com/nabid69/Unsupervised-Curriculum-Learning—UCL.
This paper presents a Convolutional Neural Networks (CNN) based Unsupervised Curriculum Learning approach for the recognition of water bodies to overcome the stated challenges for remote sensing based RGB imagery. The unsupervised nature of the presented algorithm eliminates the need for labelled training data. The problem is cast as a two class clustering problem (water and non-water), while clustering is done on deep features obtained by a pre-trained CNN. After initial clusters have been identified, representative samples from each cluster are chosen by the unsupervised curriculum learning algorithm for fine-tuning the feature extractor. The stated process is repeated iteratively until convergence. Three datasets have been used to evaluate the approach and show its effectiveness on varying scales: (i) SAT-6 dataset comprising high resolution aircraft images, (ii) Sentinel-2 of EuroSAT, comprising remote sensing images with low resolution, and (iii) PakSAT, a new dataset we created for this study. PakSAT is the first Pakistani Sentinel-2 dataset designed to classify water bodies of Pakistan. Extensive experiments on these datasets demonstrate the progressive learning behaviour of UCL and reported promising results of water classification on all three datasets. The obtained accuracies outperform the supervised methods in domain adaptation, demonstrating the effectiveness of the proposed algorithm.
This article describes analytical work carried out in a pilot project for the Swedish Space Data Lab (SSDL), which focused on monitoring drought in the Mälardalen region in central Sweden. Normalized Difference Vegetation Index (NDVI) and the Moisture Stress Index (MSI) – commonly used to analyse drought – are estimated from Sentinel 2 satellite data and averaged over a selection of seven grassland areas of interest. To derive a complete time-series over a season that interpolates over days with missing data, we use Gaussian Process Regression, a technique from multivariate Bayesian analysis. The analysis show significant differences at 95% confidence for five out of seven areas when comparing the peak drought period in the dry year 2018 compared to the corresponding period in 2019. A cross-validation analysis indicates that the model parameter estimates are robust for temporal covariance structure (while inconclusive for the spatial dimensions). There were no signs of over-fitting when comparing in-sample and out-of-sample RMSE.
In this paper, we first describe various synchronous and asynchronous methods for enhancing student engagement in big online courses. We showcase the implementation of these methods in the “Introduction to Artificial Intelligence (AI)” course at Luleå University of Technology, which has attracted around 500 students in each of its iterations (twice yearly, since 2019). We also show that these methods can be applied efficiently, in terms of the teaching hours required. With the increase in digitization and student mobility, the demand for improved and personalized content delivery for distance education has also increased. This applies not only in the context of traditional undergraduate education, but also in the context of adult education and lifelong learning. This higher level of demand, however, introduces a challenge, especially as it is typically combined with a shortage of staff and needs for efficient education. This challenge is further amplified by the current pandemic situation, which led to an even bigger risk of student-dropout. To mitigate this risk, as well as to meet the increased demand, we applied various methods for creating engaging interaction in our pedagogy based on Moor’s framework: learner-to-learner, learner-to-instructor, and learner-to-content engagement strategies. The main methods of this pedagogy are as follows: short, and interactive videos, active discussions in topic-based forums, regular live sessions with group discussions, and the introduction of optional content at many points in the course, to address different target groups. In this paper, we show how we originally designed and continuously improved the course, without requiring more than 500 teaching hours per iteration (one hour per enrolled student), while we also managed to increase the successful completion rate of the participants by 10%, and improved student engagement and feedback for the course by 50%. We intend to share a set of best-practices applicable to many other e-learning courses in ICT.
With the ubiquity and anonymity of the Internet, the spread of hate speech has been a growing concern for many years now. The language used for the purpose of dehumanizing, defaming or threatening individuals and marginalized groups not only threatens the mental health of its targets, as well as their democratic access to the Internet, but also the fabric of our society. Because of this, much effort has been devoted to manual moderation. The amount of data generated each day, particularly on social media platforms such as Facebook and twitter, however makes this a Sisyphean task. This has led to an increased demand for automatic methods of hate speech detection.
Here, to contribute towards solving the task of hate speech detection, we worked with a simple ensemble of transformer models on a twitter-based hate speech benchmark. Using this method, we attained a weighted F1-score of 0.8426, which we managed to further improve by leveraging more training data, achieving a weighted F1-score of 0.8504. Thus markedly outperforming the best performing system in the literature.
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate speech can be detrimental to maintaining the peace and harmony in society. Particularly when hate speech is spread with the intention to defame people, or spoil the image of a person, a community, or a nation. A major ground for spreading hate speech is that of social media. This significantly contributes to the difficultyof the task, as social media posts not only include paralinguistic tools (e.g. emoticons, and hashtags), their linguistic content contains plenty of poorly written text that does not adhere to grammar rules. With the recent development in Natural Language Processing (NLP), particularly with deep architecture, it is now possible to anlayze unstructured composite natural language text. For this reason, we propose a deep NLP model for the detection of automatic hate speech in social media data. We have applied our model on the HASOC2019 hate speech corpus, and attained a macro F1 score of 0.63 in the detection of hate speech.
Hate speech detection on social media platforms is crucial as it helps to avoid severe harm to marginalized people and groups. The application of Natural Language Processing (NLP) and Deep Learning has garnered encouraging results in the task of hate speech detection. The expressionof hate, however, is varied and ever-evolving. Thus better detection systems need to adapt to this variance. Because of this, researchers keep on collecting data and regularly come up with hate speech detection competitions. In this paper, we discuss our entry to one such competition,namely the English version of sub-task A for the OffensEval competition. Our contribution can be perceived through our results, that was first an F1-score of 0.9087, and with further refinementsdescribed here climb up to 0.9166. It serves to give more support to our hypothesis that one ofthe variants of BERT, namely RoBERTa can successfully differentiate between offensive and non-offensive tweets, given the proper preprocessing steps
The National Space Data Lab is a collaboration project between Swedish National Space Agency, RISE Research Institutes of Sweden, Luleå University of Technology and AI Sweden. It will be a national knowledge and data hub for Swedish authorities’ work on earth observation data and for the development of AI-based analysis of data, generated in space systems. The platform is deployed on Kubernetes.
Purpose
• Increase the availability of space data for the benefit of developing society and industry
• Provide platform for accessing space data and analytical tools
In this paper we generate word meta-embeddings from already existing embeddings using cross-encoding. Previous approaches can only work with words that exist in each source embedding, while the architecture presented here drops this requirement. We demonstrate the method using two pre-trained embeddings, namely GloVE and FastText. Furthermore, we propose additional improvements to the training process of the meta-embedding. Results on six standard tests for word similarity show that the meta-embedding trained outperforms the original embeddings. Moreover, this performance can be further increased with the proposed improvements, resulting in a competitive performance with those reported earlier.
Social media platforms have revolutionized how people interact with each other and how people gain information. However, social media platforms such as Twitter and Facebook quickly became the platform for public manipulation and spreading or amplifying political or ideological misinformation. Although malicious content can be shared by individuals, today millions of individual and coordinated automated accounts exist, also called bots which share hate, spread misinformation and manipulate public opinion without any human intervention. The work presented in this paper aims at designing and implementing deep learning approaches that successfully identify social media bots. Moreover we show that deep learning models can yield an accuracy of 0.9 on the PAN 2019 Bots and Gender Profiling dataset. In addition, the findings of this work also show that pre-trained models will be able to improve the accuracy of deep learning models and compete with Classical Machine Learning methods even on limited dataset.
In the age of The Internet we are generating documents (both written and spoken) at an unprecedented rate. This rate of document creation—as well as the number of already existing documents—makes manual processing time-consuming and costly to the point of infeasibility. This is the reason why we are in need of automatic methods that are suitable for the processing of written as well as spoken documents. One crucial part of processing documents is partitioning said documents into different segments based on the topic being discussed. A self-evident application of this would be for example partitioning a news broadcast into different news stories. One of the first steps of doing so would be identifying the shifts in the topic framework, or in other words, finding the time-interval where the announcer is changing from one news story to the next. Naturally, as the transition between news stories are often accompanied by easily identifiable audio—(e.g. signal) and visual (e.g. change in graphics) cues, this would not be a particularly different task. However, in other cases the solution to this problem would be far less obvious. Here, we approach this task for the case of spoken dialogues (interviews). One particular difficulty of these dialogues is that the interlocutors often switch between languages. Because of this (and in the hope of contributing to the generality of our method) we carried out topic change detection in a content-free manner, focusing on speaker roles, and prosodic features. For the processing of said features we will employ neural networks, and will demonstrate that using the proper classifier combination methods this can lead to a detection performance that is competitive with that of the state-of-the-art.
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained.
Hate speech is a burning issue of today’s society that cuts across numerous strategic areas, including human rights protection, refugee protection, and the fight against racism and discrimination. The gravity of the subject is further demonstrated by António Guterres, the United Nations Secretary-General, calling it “a menace to democratic values, social stability, and peace”. One central platform for the spread of hate speech is the Internet and social media in particular. Thus, automatic detection of hateful and offensive content on these platforms is a crucial challenge that would strongly contribute to an equal and sustainable society when overcome. One significant difficulty in meeting this challenge is collecting sufficient labeled data. In our work, we examine how various resources can be leveraged to circumvent this difficulty. We carry out extensive experiments to exploit various data sources using different machine learning models, including state-of-the-art transformers. We have found that using our proposed methods, one can attain state-of-the-art performance detecting hate speech on Twitter (outperforming the winner of both the HASOC 2019 and HASOC 2020 competitions). It is observed that in general, adding more data improves the performance or does not decrease it. Even when using good language models and knowledge transfer mechanisms, the best results were attained using data from one or two additional data sets.
In this paper we present an approach for the PAN 2019 Author Profiling challenge. The task here is to detect Twitter bots and also to classify the gender of human Twitter users as male or female, based on a hundred select tweets from their profile. Focusing on feature engineering, we explore the semantic categories present in tweets. We combine these semantic features with part of speech tags and other stylistic features – e.g. character floodings and the use of capital letters – for our eventual feature set. We have experimented with different machine learning techniques, including ensemble techniques, and found AdaBoost to be the most successful (attaining an F1-score of 0.99 on the development set). Using this technique, we achieved an accuracy score of 89.17% for English language tweets in the bot detection subtask
The ongoing COVID-19 pandemic has brought online education to the forefront of pedagogical discussions. To make this increased interest sustainable in a post-pandemic era, online courses must be built on strong pedagogical foundations. With a long history of pedagogic research, there are many principles, frameworks, and models available to help teachers in doing so. These models cover different teaching perspectives, such as constructive alignment, feedback, and the learning environment. In this paper, we discuss how we designed and implemented our online Natural Language Processing (NLP) course following constructive alignment and adhering to the pedagogical principles of LTU. By examining our course and analyzing student evaluation forms, we show that we have met our goal and successfully delivered the course. Furthermore, we discuss the additional benefits resulting from the current mode of delivery, including the increased reusability of course content and increased potential for collaboration between universities. Lastly, we also discuss where we can and will further improve the current course design.
A pivotal question in Automatic Speech Recognition (ASR) is the robustness of the trained models. In this study, we investigate the combination of two methods commonly applied to increase the robustness of ASR systems. On the one hand, inspired by auditory experiments and signal processing considerations, multi-band band processing has been used for decades to improve the noise robustness of speech recognition. On the other hand, dropout is a commonly used regularization technique to prevent overfitting by keeping the model from becoming over-reliant on a small set of neurons. We hypothesize that the careful combination of the two approaches would lead to increased robustness, by preventing the resulting model from over-rely on any given band.
To verify our hypothesis, we investigate various approaches for the combination of the two methods using the Aurora-4 corpus. The results obtained corroborate our initial assumption, and show that the proper combination of the two techniques leads to increased robustness, and to significantly lower word error rates (WERs). Furthermore, we find that the accuracy scores attained here compare favourably to those reported recently on the clean training scenario of the Aurora-4 corpus.
This report was written to describe the systems that were submitted by the team “TheNorth” for the HaSpeeDe 2 shared task organised within EVALITA 2020. To address the main task which is hate speech detection, we fine-tuned BERT-based models. We evaluated both multilingual and Italian language models trained with the data provided and additional data. We also studied the contributions of multitask learning considering both hate speech detection and stereotype detection tasks.
Hateful content is published and spread on social media at an increasing rate, harming the user experience.In addition, hateful content targeting particular, marginalized/vulnerable groups (e.g. homophobic/trans-phobic content) can cause even more harm to members of said groups. Hence, detecting hateful contentis crucial, regardless of its origin, or the language used. The large variety of (often underresourced)languages used, however, makes this task daunting, especially as many users use code-mixing in theirmessages. To help overcome these difficulties, the approach we present here uses a multi-languageframework. And to further mitigate the scarcity of labelled data, it also leverages data from the relatedtask of sentiment-analysis to improve the detection of homophobic/transphobic content. We evaluatedour system by participating in a sentiment analysis and hate speech detection challenge. Results showthat our multi-task model outperforms its single-task counterpart (on average, by 24%) on the detection ofhomophobic/transphobic content. Moreover, the results achieved in detecting homophobic/transphobiccontent put our system in 1st or 2nd place for three out of four languages examined.
Depression is a common mental disorder that severely affects the quality of life, and can lead to suicide. When diagnosed in time, mild, moderate, and even severe depression can be treated. This is why it is vital to detect signs of depression in time. One possibility for this is the use of text classification models on social media posts. Transformers have achieved state-of-the-art performance on a variety of similar text classification tasks. One drawback, however, is that when the dataset is imbalanced, the performance of these models may be negatively affected. Because of this, in this paper, we examine the effect of balancing a depression detection dataset using data augmentation. In particular, we use abstractive summarization techniques for data augmentation. We examine the effect of this method on the LT-EDI-ACL2022 task. Our results show that when increasing the multiplicity of the minority classes to the right degree, this data augmentation method can in fact improve classification scores on the task.
Cloud formations often obscure optical satellite-based monitoring of the Earth’s surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance for a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which are often difficult to come by in EO contexts. This is especially true when it comes to cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is common practice. To alleviate the COT data scarcity problem, in this work, we propose a novel synthetic dataset for COT estimation, which we subsequently leverage for obtaining reliable and versatile cloud masks on real data. In our dataset, top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multispectral Imagery (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. In particular, by thresholding COT estimates from our ML models, we show on two satellite image datasets (one that is publicly available, and one which we have collected and annotated) that reliable cloud masks can be obtained. The synthetic data, the newly collected real dataset, code and models have been made publicly available.
Sign gesture recognition is the field that models sign gestures in order to facilitate communication with hearing and speech impaired people. Sign gestures are recorded with devices like a video camera or a depth camera. Palm gestures are also recorded with the Leap motion sensor. In this paper, we address palm sign gesture recognition using the Leap motion sensor. We extract geometric features from Leap motion recordings. Next, we encode the Genetic Algorithm (GA) for feature selection. Genetically selected features are fed to different classifiers for gesture recognition. Here we have used Support Vector Machine (SVM), Random Forest (RF), and Naive Bayes (NB) classifiers to have their comparative results. The gesture recognition accuracy of 74.00% is recorded with RF classifier on the Leap motion sign gesture dataset.
The increasing collection and usage of data and data analytics has prompted development of Data Labs. These labs are (ideally) a way for multiple beneficiaries to make use of the same data in ways that are value-generating for all. However, establishing data labs requires the mobilization of various infrastructural elements, such as beneficiaries, offerings and needed analytics talent, all of which are ambiguous and uncertain. The aim of this paper is to examine how such beneficiaries can be identified and understood for the nascent Swedish space data lab. The paper reports on the development of persona descriptions that aim to support and represent the needs of key beneficiaries of earth observation data. Our main results include three thorough persona descriptions that represent the lab’s respective beneficiaries and their distinct characteristics. We discuss the implications of the personas on addressing the infrastructural challenges, as well as the lab’s design. We conclude that personas provide emerging data labs with relatively stable beneficiary archetypes that supports the further development of the other infrastructure components. More research is needed to better understand how these persona descriptions may evolve, as well as how they may influence the continuous development process of the space data lab.
We investigate the performance of a state-of-the-art (SoTA) architecture T5 (available on the SuperGLUE) and compare it with 3 other previous SoTA architectures across 5 different tasks from 2 relatively diverse datasets. The datasets are diverse in terms of the number and types of tasks they have. To improve performance, we augment the training data by using a new autoregressive conversational AI model checkpoint. We achieve near-SoTA results on a couple of the tasks - macro F1 scores of 81.66% for task A of the OLID 2019 dataset and 82.54% for task A of the hate speech and offensive content (HASOC) 2021 dataset, where SoTA are 82.9% and 83.05%, respectively. We perform error analysis and explain why one of the models (Bi-LSTM) makes the predictions it does by using a publicly available algorithm: Integrated Gradient (IG). This is because explainable artificial intelligence (XAI) is essential for earning the trust of users. The main contributions of this work are the implementation method of T5, which is discussed; the data augmentation, which brought performance improvements; and the revelation on the shortcomings of the HASOC 2021 dataset. The revelation shows the difficulties of poor data annotation by using a small set of examples where the T5 model made the correct predictions, even when the ground truth of the test set were incorrect (in our opinion). We also provide our model checkpoints on the HuggingFace hub1. https://huggingface.co/sana-ngu/HaT5_augmentation https://huggingface.co/sana-ngu/HaT5.
In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a challenging task for data-hungry state-of-the-art Deep Learning based systems. Semantic Hashing is an attempt to overcome such a challenge and learn robust text classification. Current word embedding based methods [11], [13], [14] are dependent on vocabularies. One of the major drawbacks of such methods is out-of-vocabulary terms, especially when having small training datasets and using a wider vocabulary. This is the case in Intent Classification for chatbots, where typically small datasets are extracted from internet communication. Two problems arise with the use of internet communication. First, such datasets miss a lot of terms in the vocabulary to use word embeddings efficiently. Second, users frequently make spelling errors. Typically, the models for intent classification are not trained with spelling errors and it is difficult to think about ways in which users will make mistakes. Models depending on a word vocabulary will always face such issues. An ideal classifier should handle spelling errors inherently. With Semantic Hashing, we overcome these challenges and achieve state-of-the-art results on three datasets: Chatbot, Ask Ubuntu, and Web Applications [3]. Our benchmarks are available online.
The detection of COVID-19 is and will remain in the foreseeable future a crucial challenge, making the development of tools for the task important. One possible approach, on the confines of speech and audio processing, is detecting potential COVID19 cases based on cough sounds. We propose a simple, yet robust method based on the well-known ComParE 2016 feature set, and two classical machine learning models, namely Random Forests, and Support Vector Machines (SVMs). Furthermore, we combine the two methods, by calculating the weighted average of their predictions. Our results in the DiCOVA challenge show that this simple approach leads to a robust solution while producing competitive results. Based on the Area Under the Receiver Operating Characteristic Curve (AUC ROC) score, both classical machine learning methods we applied markedly outperform the baseline provided by the challenge organisers. Moreover, their combination attains an AUC ROC score of 85.21, positioning us at fourth place on the leaderboard (where the second team attained a similar, 85.43 score). Here, we would describe this system in more detail, and analyse the resulting models, drawing conclusions, and determining future work directions.