Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Creating and Leveraging a Synthetic Dataset of Cloud Optical Thickness Measures for Cloud Detection in MSI
Department of Computer Science, RISE Research Institutes of Sweden, Borås, 501 15, Sweden.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-5922-7889
Department of Computer Science, RISE Research Institutes of Sweden, Borås, 501 15, Sweden.
Department of Computer Science, RISE Research Institutes of Sweden, Borås, 501 15, Sweden.
Show others and affiliations
2024 (English)In: Remote Sensing, E-ISSN 2072-4292, Vol. 16, no 4, article id 694Article in journal (Refereed) Published
Abstract [en]

Cloud formations often obscure optical satellite-based monitoring of the Earth’s surface, thus limiting Earth observation (EO) activities such as land cover mapping, ocean color analysis, and cropland monitoring. The integration of machine learning (ML) methods within the remote sensing domain has significantly improved performance for a wide range of EO tasks, including cloud detection and filtering, but there is still much room for improvement. A key bottleneck is that ML methods typically depend on large amounts of annotated data for training, which are often difficult to come by in EO contexts. This is especially true when it comes to cloud optical thickness (COT) estimation. A reliable estimation of COT enables more fine-grained and application-dependent control compared to using pre-specified cloud categories, as is common practice. To alleviate the COT data scarcity problem, in this work, we propose a novel synthetic dataset for COT estimation, which we subsequently leverage for obtaining reliable and versatile cloud masks on real data. In our dataset, top-of-atmosphere radiances have been simulated for 12 of the spectral bands of the Multispectral Imagery (MSI) sensor onboard Sentinel-2 platforms. These data points have been simulated under consideration of different cloud types, COTs, and ground surface and atmospheric profiles. Extensive experimentation of training several ML models to predict COT from the measured reflectivity of the spectral bands demonstrates the usefulness of our proposed dataset. In particular, by thresholding COT estimates from our ML models, we show on two satellite image datasets (one that is publicly available, and one which we have collected and annotated) that reliable cloud masks can be obtained. The synthetic data, the newly collected real dataset, code and models have been made publicly available.

Place, publisher, year, edition, pages
MDPI, 2024. Vol. 16, no 4, article id 694
Keywords [en]
cloud detection, cloud optical thickness, datasets, machine learning
National Category
Remote Sensing
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-104597DOI: 10.3390/rs16040694ISI: 001177031000001Scopus ID: 2-s2.0-85185890836OAI: oai:DiVA.org:ltu-104597DiVA, id: diva2:1844546
Funder
Vinnova, 2021-03643; 2023-02787
Note

Validerad;2024;Nivå 2;2024-04-09 (sofila);

Full text license: CC BY

Available from: 2024-03-14 Created: 2024-03-14 Last updated: 2024-11-20Bibliographically approved
In thesis
1. Unsupervised Curriculum Learning Case Study: Earth Observation UCL4EO
Open this publication in new window or tab >>Unsupervised Curriculum Learning Case Study: Earth Observation UCL4EO
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Earth Observation (EO) data is crucial for understanding, managing, and conserving our planet's ecosystem and its natural resources. This data enables humanity to monitor environmental changes, such as natural disasters, urban growth, and climate shifts, assisting informed decisions and proactive measures. Early EO heavily relied on statistical methods and expert domain knowledge, but the advent of machine learning has revolutionized EO data processing, enhancing efficiency and accuracy. Conventional ML models require expensive and labor-intensive data labeling. In contrast, unsupervised ML techniques can learn features from data without the need for manual labeling, making the process more efficient and cost-effective.

 

This thesis presents a UCL approach utilizing advanced DL models to classify EO data, referred to as UCL4EO. This approach eliminates the need for manual data labeling in training the DL model. The UCL framework comprises i) a DL model tailored for feature extraction from image data, ii) a clustering method to group deep features, and iii) a selection operation to capture representative samples from these clusters. The CNN extracts meaningful features from images, subjected to a clustering algorithm to create pseudo-labels. After identifying the initial clusters, representative samples from each cluster are chosen using the UCL selection operation to fine-tune the feature extractor. The stated process is repeated iteratively until convergence. The proposed UCL approach progressively learns and incorporates salient data features in an unsupervised manner by utilizing pseudo-labels.

 

UCL started as a proof of concept to show the viability of the method for binary classification on RS and aerial imagery. Specifically, the UCL framework is employed to identify water bodies using three RGB datasets, encompassing both low and high-resolution RS and aerial imagery. While UCL has been extensively examined with RGB imagery, it has been adapted to benefit from the enhanced capabilities of multi-spectral satellite imagery. This adaptation enables UCL to generalize to multi-spectral imagery from Sentinel-2 to detect forest fires in Australia. UCL undergoes subsequent improvements and is further investigated to identify utility poles in high-resolution UAV images. These gray-scale images of utility poles pose computer vision challenges, including issues like occlusion and cropping, where a significant portion of the image contains the background and only a slight appearance of the utility pole. Extensive experimentation on the mentioned tasks effectively showcases UCL's adaptive learning capabilities, producing promising results. The achieved accuracy surpassed those of supervised methods in cross-domain adaptation on similar tasks, underscoring the effectiveness of the proposed algorithm.

 

The scope of UCL has been extended to encompass multi-class classification tasks in the domain of RS data, referred to as Multi-class UCL. Multi-class UCL progressively acquires knowledge about various categories on multi-scale resolution. To investigate Multi-class UCL, we have used four publicly available datasets of Sentinel-2 and aerial imagery: EuroSAT, SAT-6, UCMerced, and RSSCN7. Comprehensive experiments conducted on the above-mentioned datasets revealed better cross-domain adaptation capabilities compared to supervised methods, thereby demonstrating the effectiveness of Multi-class UCL.

 

In these investigations, two datasets are generated using Sentinel-2 satellite imagery: one for water bodies - PakSAT and the other for Australian forest fires. However, cloud cover poses a significant challenge by obstructing the satellite's ability to capture clear images of the Earth's surface. To address this issue, available cloud masking techniques are employed to filter out images affected by cloud cover, ensuring the datasets contain only clear and usable data. Later, this thesis examines cloud detection and Cloud Optical Thickness (COT) estimation from Sentinel-2 imagery. We employed machine-learning techniques, achieving better performance than SCL designed by ESA for cloud cover tasks.

 

In addition to the application in RS data, UCL has been investigated in other domains of EO, such as undersea imagery. Furthermore, UCL has also been used for tasks like natural scene classification, medical imaging, and document analysis, demonstrating its versatility and broad applicability. Further exploration of UCL could involve improving the process of generating pseudo-labels through deep learning techniques.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2024
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
UCL, Earth Observation, EO, Remote Sensing, RS, Computer VIsion, Deep Learning, Unsupervised Learning
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-109974 (URN)978-91-8048-632-3 (ISBN)978-91-8048-633-0 (ISBN)
Public defence
2024-11-08, E632, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2024-09-16 Created: 2024-09-14 Last updated: 2024-10-18Bibliographically approved

Open Access in DiVA

fulltext(6281 kB)85 downloads
File information
File name FULLTEXT01.pdfFile size 6281 kBChecksum SHA-512
105bc61a654fc4f8d25326a2fd624aaa3a99f69193cde39f9f24847a7607d69d7f26ac4f88d8d9396d8755b585d043e449f11acecb9eacf70e5c4ad7e4413bf7
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Abid, NosheenKovács, György

Search in DiVA

By author/editor
Abid, NosheenScheirer, RonaldCeccobello, ChiaraKovács, György
By organisation
Embedded Internet Systems Lab
In the same journal
Remote Sensing
Remote Sensing

Search outside of DiVA

GoogleGoogle Scholar
Total: 85 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 284 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf