7891011121310 of 18
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Unsupervised Curriculum Learning Case Study: Earth Observation UCL4EO
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-5922-7889
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Earth Observation (EO) data, collected via satellites and remote sensing technologies, is crucial for understanding, managing, and conserving the Earth. It enables humankind to monitor environmental changes, such as natural disasters, urban growth, and climate shifts, assisting informed decisions and proactive measures. Early Remote Sensing (RS) heavily relied on statistical methods and expert domain knowledge, but the advent of machine learning has revolutionized EO data processing, enhancing efficiency and accuracy. Conventional machine learning (ML) models require expensive and labor-intensive data labeling. In contrast, unsupervised ML techniques can learn features from data without the need for manual labeling, making the process more efficient and cost-effective.

This thesis presents an innovative Unsupervised Curriculum Learning (UCL) approach utilizing advanced deep learning (DL) models to classify EO data, referred to as UCL4EO. This approach eliminates the need for manual data labeling in training the DL model. The UCL framework comprises i) a DL model, typically a Convolutional Neural Network (CNN) tailored for feature extraction from image data, ii) a clustering technique to cluster deep features, and iii) a selection operation to select representative samples from these clusters. The CNN extracts meaningful features from images, subjected to a clustering algorithm to create pseudo-labels. After identifying the initial clusters, representative samples from each cluster are chosen using the UCL selection operation to fine-tune the feature extractor. The stated process is repeated iteratively until convergence. The proposed UCL approach progressively learns and incorporates salient data features in an unsupervised manner by utilizing pseudo-labels.

UCL serves as a proof of concept in a simpler setting of detection tasks on RS and aerial imagery. Specifically, the UCL framework is employed to identify water bodies using three RGB datasets, encompassing both low and high-resolution RS and aerial imagery. While UCL has been extensively examined with RGB imagery, it has been adapted to benefit from the enhanced capabilities of multi-spectral satellite imagery. This adaptation enables UCL to generalize to multi-spectral imagery from Sentinel-2 to detect forest fires in Australia. UCL undergoes subsequent improvements and is further investigated to identify utility poles in high-resolution UAV images. These gray-scale images of utility poles pose computer vision challenges, including issues like occlusion and cropping, where a significant portion of the image contains the background and only a slight appearance of the utility pole. Extensive experimentation on the mentioned tasks effectively showcases UCL's adaptive learning capabilities, producing promising results. The achieved accuracy surpassed those of supervised methods in cross-domain adaptation on similar tasks, underscoring the effectiveness of the proposed algorithm.

In these investigations, two datasets are generated using Sentinel-2: one for water bodies - PakSAT and the other for Australian forest fire. Cloud cover significantly hinders the acquisition of satellite imagery depicting the Earth's surface. In preparing these datasets, this work employs available cloud masking solutions to avoid the images with cloud cover. Later, this thesis examines cloud detection and Cloud Optical Thickness (COT) estimation from Sentinel-2 imagery. We employed advanced machine-learning techniques, achieving state-of-the-art performance for cloud cover tasks.

The scope of UCL has been extended to encompass multi-class classification tasks in the domain of RS data, referred to as Multi-class UCL. Multi-class UCL progressively acquires knowledge about various categories on multi-scale resolution. To investigate Multi-class UCL, we have used three publicly available datasets of Sentinel-2 and aerial imagery: EuroSAT, SAT-6, and RSSCN7. The evaluation of Multi-class UCL’s performance incorporates the concept of a confusion matrix to compare the predicted labels with the actual labels. Comprehensive experiments conducted on the specified datasets revealed better cross-domain adaptation capabilities compared to supervised methods, thereby demonstrating the effectiveness of Multi-class UCL.

In addition to the application in RS data, UCL has been investigated in other domains of EO, such as undersea imagery. Furthermore, UCL has also been used for tasks like natural scene classification, medical imaging, and document analysis, demonstrating its versatility and broad applicability. Further exploration of UCL could involve improving the process of generating pseudo-labels through deep learning techniques.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2024.
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords [en]
UCL, Earth Observation, EO, Remote Sensing, RS, Computer VIsion, Deep Learning, Unsupervised Learning
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-109974ISBN: 978-91-8048-632-3 (print)ISBN: 978-91-8048-633-0 (electronic)OAI: oai:DiVA.org:ltu-109974DiVA, id: diva2:1897739
Public defence
2024-11-08, E632, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2024-09-16 Created: 2024-09-14 Last updated: 2024-09-16Bibliographically approved
List of papers
1. Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI
Open this publication in new window or tab >>Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI
Show others...
2024 (English)In: Remote Sensing, E-ISSN 2072-4292, Vol. 16, no 4, article id 694Article in journal (Refereed) Published
Abstract [en]

Cloud formations often obscure optical satellite-based monitoring of the Earth’s surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance for a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which are often difficult to come by in EO contexts. This is especially true when it comes to cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is common practice. To alleviate the COT data scarcity problem, in this work, we propose a novel synthetic dataset for COT estimation, which we subsequently leverage for obtaining reliable and versatile cloud masks on real data. In our dataset, top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multispectral Imagery (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. In particular, by thresholding COT estimates from our ML models, we show on two satellite image datasets (one that is publicly available, and one which we have collected and annotated) that reliable cloud masks can be obtained. The synthetic data, the newly collected real dataset, code and models have been made publicly available.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
cloud detection, cloud optical thickness, datasets, machine learning
National Category
Remote Sensing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-104597 (URN)10.3390/rs16040694 (DOI)2-s2.0-85185890836 (Scopus ID)
Funder
Vinnova, 2021-03643; 2023-02787
Note

Validerad;2024;Nivå 2;2024-04-09 (sofila);

Full text license: CC BY

Available from: 2024-03-14 Created: 2024-03-14 Last updated: 2024-09-14Bibliographically approved
2. UCL: Unsupervised Curriculum Learning for Water Body Classification from Remote Sensing Imagery
Open this publication in new window or tab >>UCL: Unsupervised Curriculum Learning for Water Body Classification from Remote Sensing Imagery
Show others...
2021 (English)In: International Journal of Applied Earth Observation and Geoinformation, ISSN 1569-8432, E-ISSN 1872-826X, Vol. 105, article id 102568Article in journal (Refereed) Published
Abstract [en]

This paper presents a Convolutional Neural Networks (CNN) based Unsupervised Curriculum Learning approach for the recognition of water bodies to overcome the stated challenges for remote sensing based RGB imagery. The unsupervised nature of the presented algorithm eliminates the need for labelled training data. The problem is cast as a two class clustering problem (water and non-water), while clustering is done on deep features obtained by a pre-trained CNN. After initial clusters have been identified, representative samples from each cluster are chosen by the unsupervised curriculum learning algorithm for fine-tuning the feature extractor. The stated process is repeated iteratively until convergence. Three datasets have been used to evaluate the approach and show its effectiveness on varying scales: (i) SAT-6 dataset comprising high resolution aircraft images, (ii) Sentinel-2 of EuroSAT, comprising remote sensing images with low resolution, and (iii) PakSAT, a new dataset we created for this study. PakSAT is the first Pakistani Sentinel-2 dataset designed to classify water bodies of Pakistan. Extensive experiments on these datasets demonstrate the progressive learning behaviour of UCL and reported promising results of water classification on all three datasets. The obtained accuracies outperform the supervised methods in domain adaptation, demonstrating the effectiveness of the proposed algorithm.

Place, publisher, year, edition, pages
Elsevier, 2021
Keywords
Sentinel-2, Aircraft Imagery, Remote Sensing, Water classification, Deep Learning, Unsupervised Curriculum Learning, Multi-scale Classification
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-87544 (URN)10.1016/j.jag.2021.102568 (DOI)000716818200002 ()2-s2.0-85121593506 (Scopus ID)
Note

Validerad;2021;Nivå 2;2021-11-08 (johcin);

Full text license: CC BY-NC-ND

Available from: 2021-10-18 Created: 2021-10-18 Last updated: 2024-09-14Bibliographically approved
3. Seagrass classification using unsupervised curriculum learning (UCL)
Open this publication in new window or tab >>Seagrass classification using unsupervised curriculum learning (UCL)
Show others...
2024 (English)In: Ecological Informatics, ISSN 1574-9541, E-ISSN 1878-0512, Vol. 83, article id 102804Article in journal (Refereed) Published
Abstract [en]

Seagrass ecosystems are pivotal in marine environments, serving as crucial habitats for diverse marine species and contributing significantly to carbon sequestration. Accurate classification of seagrass species from underwater images is imperative for monitoring and preserving these ecosystems. This paper introduces Unsupervised Curriculum Learning (UCL) to seagrass classification using the DeepSeagrass dataset. UCL progressively learns from simpler to more complex examples, enhancing the model's ability to discern seagrass features in a curriculum-driven manner. Experiments employing state-of-the-art deep learning architectures, convolutional neural networks (CNNs), show that UCL achieved overall 90.12 % precision and 89 % recall, which significantly improves classification accuracy and robustness, outperforming some traditional supervised learning approaches like SimCLR, and unsupervised approaches like Zero-shot CLIP. The methodology of UCL involves four main steps: high-dimensional feature extraction, pseudo-label generation through clustering, reliable sample selection, and fine-tuning the model. The iterative UCL framework refines CNN's learning of underwater images, demonstrating superior accuracy, generalization, and adaptability to unseen seagrass and background samples of undersea images. The findings presented in this paper contribute to the advancement of seagrass classification techniques, providing valuable insights into the conservation and management of marine ecosystems. The code and dataset are made publicly available and can be assessed here: https://github.com/nabid69/Unsupervised-Curriculum-Learning—UCL.

 

Place, publisher, year, edition, pages
Elsevier B.V., 2024
Keywords
Seagrass, Deep learning, Unsupervised classification, Curriculum learning, Unsupervised curriculum learning, Underwater digital imaging
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-109778 (URN)10.1016/j.ecoinf.2024.102804 (DOI)2-s2.0-85202895926 (Scopus ID)
Note

Validerad;2024;Nivå 2;2024-09-09 (hanlid);

Full text license: CC BY

Available from: 2024-09-09 Created: 2024-09-09 Last updated: 2024-09-14Bibliographically approved
4. Burnt Forest Estimation from Sentinel-2 Imagery of Australia using Unsupervised Deep Learning
Open this publication in new window or tab >>Burnt Forest Estimation from Sentinel-2 Imagery of Australia using Unsupervised Deep Learning
Show others...
2021 (English)In: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA), IEEE, 2021, p. 74-81Conference paper, Published paper (Refereed)
Abstract [en]

Massive wildfires not only in Australia, but also worldwide are burning millions of hectares of forests and green land affecting the social, ecological, and economical situation. Widely used indices-based threshold methods like Normalized Burned Ratio (NBR) require a huge amount of data preprocessing and are specific to the data capturing source. State-of-the-art deep learning models, on the other hand, are supervised and require domain experts knowledge for labeling the data in huge quantity. These limitations make the existing models difficult to be adaptable to new variations in the data and capturing sources. In this work, we have proposed an unsupervised deep learning based architecture to map the burnt regions of forests by learning features progressively. The model considers small patches of satellite imagery and classifies them into burnt and not burnt. These small patches are concatenated into binary masks to segment out the burnt region of the forests. The proposed system is composed of two modules: 1) a state-of-the-art deep learning architecture for feature extraction and 2) a clustering algorithm for the generation of pseudo labels to train the deep learning architecture. The proposed method is capable of learning the features progressively in an unsupervised fashion from the data with pseudo labels, reducing the exhausting efforts of data labeling that requires expert knowledge. We have used the realtime data of Sentinel-2 for training the model and mapping the burnt regions. The obtained F1-Score of 0.87 demonstrates the effectiveness of the proposed model.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Unsupervised, Deep Learning, Australia, Forest Fire, Wildfire, Sentinel-2, Aerial Imagery
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-87545 (URN)10.1109/DICTA52665.2021.9647174 (DOI)000824642300010 ()2-s2.0-85124317916 (Scopus ID)
Conference
International Conference on Digital Image Computing: Techniques and Applications (DICTA), Gold Coast, Australia, Novermber 29 - December 1, 2021
Note

ISBN för värdpublikation: 978-1-6654-1709-9 (elektronisk)

Available from: 2021-10-18 Created: 2021-10-18 Last updated: 2024-09-14Bibliographically approved
5. UCL: Unsupervised Curriculum Learning for Utility Pole Detection from Aerial Imagery
Open this publication in new window or tab >>UCL: Unsupervised Curriculum Learning for Utility Pole Detection from Aerial Imagery
Show others...
2022 (English)In: Proceedings of the Digital Image Computing: Technqiues and Applications (DICTA), IEEE, 2022Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a machine learning-based approach for detecting electric poles, an essential part of power grid maintenance. With the increasing popularity of deep learning, several such approaches have been proposed for electric pole detection. However, most of these approaches are supervised, requiring a large amount of labeled data, which is time-consuming and labor-intensive. Unsupervised deep learning approaches have the potential to overcome the need for huge amounts of training data. This paper presents an unsupervised deep learning framework for utility pole detection. The framework combines Convolutional Neural Network (CNN) and clustering algorithms with a selection operation. The CNN architecture for extracting meaningful features from aerial imagery, a clustering algorithm for generating pseudo labels for the resulting features, and a selection operation to filter out reliable samples to fine-tune the CNN architecture further. The fine-tuned version then replaces the initial CNN model, thus improving the framework, and we iteratively repeat this process so that the model learns the prominent patterns in the data progressively. The presented framework is trained and tested on a small dataset of utility poles provided by “Mention Fuvex” (a Spanish company utilizing long-range drones for power line inspection). Our extensive experimentation demonstrates the progressive learning behavior of the proposed method and results in promising classification scores with significance test having p−value<0.00005 on the utility pole dataset.

Place, publisher, year, edition, pages
IEEE, 2022
Keywords
Aerial Imagery, Electric Poles, Computer Vision, Deep Learning, Unsupervised Learning
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-96195 (URN)10.1109/DICTA56598.2022.10034610 (DOI)2-s2.0-85148606239 (Scopus ID)978-1-6654-5642-5 (ISBN)
Conference
2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), November 30 - December 2, 2022, Sydney, Australia
Available from: 2023-03-20 Created: 2023-03-20 Last updated: 2024-09-14Bibliographically approved
6. UCL: Unsupervised Curriculum Learning for Image Classification
Open this publication in new window or tab >>UCL: Unsupervised Curriculum Learning for Image Classification
Show others...
(English)Manuscript (preprint) (Other (popular science, discussion, etc.))
Abstract [en]

In many real-world applications of computer vision complex domains, such as medical diagnostics and document analysis, the lack of labeled data often limits the effectiveness of traditional deep learning models. This study addresses these challenges by enhancing Unsupervised Curriculum Learning (UCL), a deep learning framework that automatically discovers meaningful patterns without the need for labeled data. Originally designed for remote sensing imagery, UCL has been expanded in this work to improve classification performance in a variety of domain-specific applications. UCL integrates a convolutional neural network, clustering algorithms, and selection techniques to classify images unsupervised. We introduce key improvements, such as spectral clustering, outlier detection, and dimensionality reduction, to boost the framework’s accuracy. Experimental results demonstrate significant performance gains, with F1-scores increasing from 68% to 94% on a three-class subset of the CIFAR-10 dataset and from 68% to 75% on a five-class subset. The updated UCL also achieved F1-scores of 85% in medical diagnosis, 82% in scene recognition, and 62% in historical document classification. These findings underscore the potential of UCL in complex real-world applications and point to areas where further advancements are needed to maximize its utility across diverse fields.

Keywords
Unsupervised Learning, Deep Learning, Classification, Computer Vision, Document Analysis, Natural Scene Images, Medical Imaging, UCL
National Category
Computer Sciences
Identifiers
urn:nbn:se:ltu:diva-109972 (URN)
Available from: 2024-09-14 Created: 2024-09-14 Last updated: 2024-09-14
7. Multi-UCL: Multi-class Unsupervised Curriculum Learning for Image Scene Classification: Case Study: Earth Observation
Open this publication in new window or tab >>Multi-UCL: Multi-class Unsupervised Curriculum Learning for Image Scene Classification: Case Study: Earth Observation
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The effective training of supervised deep learning models requires the labeling of extensive datasets, a process that is often costly and labor-intensive. Such models also face significant challenges with overfitting on the training data with true labels, leading to suboptimal performance on new datasets with slight variations in capturing sources or regions. This paper introduces Multi-class Unsupervised Curriculum Learning (Multi-class UCL), a novel deep learning framework. We demonstrate the effectiveness of this framework on the  case study of land use and cover classification that bypasses the need for labeled data, thereby improving adaptability across different datasets. Multi-class UCL leverages pseudo-labels generated from a clustering technique to train the model and incorporates a selection process that ensures an equal representation of samples from each cluster, addressing the issue of class imbalance. The study evaluates the effectiveness of Multi-class UCL through comprehensive experiments on four diverse publicly available datasets: EuroSAT, SAT-6, RSSCN7, and UCMerced. These datasets have varying resolutions, come from different capturing sources, and encompass different geographical areas.The results demonstrate that the framework effectively learns and generalizes important features from the data, showing superior adaptability and performance across various datasets compared to traditional supervised models.

National Category
Computer Sciences
Identifiers
urn:nbn:se:ltu:diva-109971 (URN)
Available from: 2024-09-14 Created: 2024-09-14 Last updated: 2024-09-14

Open Access in DiVA

No full text in DiVA

Authority records

Abid, Nosheen

Search in DiVA

By author/editor
Abid, Nosheen
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 83 hits
7891011121310 of 18
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf