RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Robust and Domain-aware Self-supervised Representation Learning
Luleå tekniska universitet, Institutionen för system- och rymdteknik, EISLAB.ORCID-id: 0000-0002-6903-7552
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Self-supervised representation learning (SSL) has emerged as a fundamental paradigm in representation learning, enabling models to learn meaningful representations without requiring labeled data. Despite its success, SSL remains constrained by two core challenges: (i) lack of robustness against real-world distribution shifts and adversarial perturbations, and (ii) lack of domain-awareness, limiting its usability beyond natural scenes. These limitations arise from the generic invariance assumptions in SSL, which rely on predefined augmentations to learn representations but suffer to generalize when exposed to unseen environmental distortions, adversarial attacks, and domain-specific nuances. Existing SSL approaches—whether contrastive learning, knowledge distillation, or information maximization—do not explicitly account for these factors, making them vulnerable in real-world applications and suboptimal in specialized domains.

This thesis aims to enhance both robustness and domain-awareness in a modular, plug-and-play manner, ensuring that the advancements are applicable across different joint embedding architecture and method (JEAM)-based SSL approaches and adaptable to future developments in SSL. To achieve this, this thesis follows a guiding principle-leveraging invariant representations to improve robustness and domain-awareness in a modular and plug-and-play manner without altering fundamental SSL objectives. This principle guides that improvements can be seamlessly integrated into existing and future SSL approaches.

To systematically address the above-stated core challenges, this thesis begins with a foundational study of SSL approaches, identifying the common schema that underlies different SSL approaches. This unification provides a conceptual view of SSL methods, allowing us to isolate the domain-sensitive and domain-agnostic components across approaches. This conceptual outcome set the stage to establish precisely where improvements are needed to enhance robustness and domain-awareness across methods as current SSL methods fail under real-world challenges.

Next, the thesis conducts a large-scale empirical evaluation of existing SSL methods against relevant robustness benchmarks, uncovering their failures under distribution shifts caused by real-world environmental challenges. This evaluation reveals a significant decline in the robustness performance of existing SSL methods across different SSL approaches. It establishes the fundamental research gap and motivates the advancements introduced in this thesis.

The first advancement focuses on robustness against distribution shifts, particularly geometric distortions such as perspective distortion (PD), which are prevalent in real-world environment but not addressed by existing SSL methods. Since PD introduces nonlinear spatial transformations, standard affine augmentations fail to model these effects, leading to degraded representation stability. To address this, this thesis introduces Möbius-based mitigating perspective distortion (MPD) and log conformal maps (LCM), mathematically grounded transformations that enable robustness without requiring perspective-distorted training data and estimation of camera parameters. These methods are additionally adapted to multiple real-world computer vision applications—including crowd counting, object detection, person re-identification, and fisheye view recognition—showcasing their effectiveness. Further, addressing the non-availability of dedicated perspectively distorted benchmark, ImageNet-PD robustness benchmark is developed to fill the gap.

Beyond environmental challenges, another critical real-world challenge is adversarial attacks. SSL methods are highly susceptible to adversarial attacks, as the learned representations lack perturbation-invariant constraints. Existing adversarial training approaches in SSL rely on brute-force attack strategies, which fail to adapt dynamically. To address this, this thesis introduces adversarial self-supervised training with adaptive-attacks (ASTrA), where attack strategies evolve dynamically based on the model’s learning dynamics and establish a correspondence between attack parameters and training examples, optimizing adversarial perturbations in a learnable manner. Unlike conventional adversarial training, ASTrA ensures robustness while maintaining SSL’s efficiency and scalability.

While robustness, in this thesis, focuses on real-world challenges in natural scenes, domain-awareness focuses on specialized visual domains beyond natural scenes. Standard SSL augmentations are designed for variations in natural scenes, making them ill-suited for specialized fields such as medical imaging and industrial mining material inspection. This thesis introduces domain-awareness in SSL that incorporates domain-specific information into SSL’s view generation process. Particularly, (i) magnification prior contrastive similarity (MPCS) makes learned representations invariant to magnifications for histopathology images by inducing varying magnifications in the view generation process, improving breast cancer recognition. (ii) depth contrast explicitly enforces modality alignment between material images and attained height of materials on conveyor belt, ensuring that the learned representations become aware of physical properties, thereby improving material classification.

Beyond robustness and domain-awareness, SSL’s ability to generalize with limited data is advantageous for its practicality. While the loss objective in SSL is generally domain-agnostic, its effectiveness relies on large-scale data. In this direction, this thesis explores functional knowledge transfer (FKT), where self-supervised and supervised learning objectives are jointly optimized, enabling SSL representations to adapt dynamically to supervised tasks. This approach enhances generalization in low-data regimes.

In conclusion, this thesis provides a foundation for robust and domain-aware self-supervised representation learning in a modular manner, highlighting its applicability to existing and future JEAM-based SSL approaches, which can inherit these advancements and adapt to emerging challenges.

Ort, förlag, år, upplaga, sidor
Luleå tekniska universitet, 2025.
Serie
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Nyckelord [en]
Self-supervised Representation Learning, Representation Learning, Robustness, Domain-aware, Perspective Distortion, Adversarial Attacks, Medical Imaging, Computer Vision
Nationell ämneskategori
Datorseende och lärande system Artificiell intelligens
Forskningsämne
Maskininlärning
Identifikatorer
URN: urn:nbn:se:ltu:diva-111571ISBN: 978-91-8048-761-0 (tryckt)ISBN: 978-91-8048-762-7 (digital)OAI: oai:DiVA.org:ltu-111571DiVA, id: diva2:1935727
Disputation
2025-04-08, C305, Luleå University of Technology, Luleå, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-02-07 Skapad: 2025-02-07 Senast uppdaterad: 2025-03-13Bibliografiskt granskad
Delarbeten
1. Can Self-Supervised Representation Learning Methods Withstand Distribution Shifts and Corruptions?
Öppna denna publikation i ny flik eller fönster >>Can Self-Supervised Representation Learning Methods Withstand Distribution Shifts and Corruptions?
Visa övriga...
2023 (Engelska)Ingår i: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Institute of Electrical and Electronics Engineers Inc. , 2023, s. 4469-4478Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers Inc., 2023
Nationell ämneskategori
Datorgrafik och datorseende Datavetenskap (datalogi)
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-103984 (URN)10.1109/ICCVW60793.2023.00481 (DOI)001156680304060 ()2-s2.0-85182928560 (Scopus ID)
Konferens
IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Paris, France, October 2-6, 2023
Anmärkning

ISBN for host publication: 979-8-3503-0745-0;

Tillgänglig från: 2024-01-29 Skapad: 2024-01-29 Senast uppdaterad: 2025-02-07
2. Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Öppna denna publikation i ny flik eller fönster >>Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Visa övriga...
2025 (Engelska)Ingår i: Computer Vision – ECCV 2024: 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXXIII / [ed] Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol, Springer Science and Business Media Deutschland GmbH , 2025, s. 345-363Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications—crowd counting, fisheye image recognition, and person re-identification—and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.

Ort, förlag, år, upplaga, sidor
Springer Science and Business Media Deutschland GmbH, 2025
Serie
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15131
Nyckelord
Perspective Distortion, Self-supervised Learning, Robust Representation Learning
Nationell ämneskategori
Datorgrafik och datorseende
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-111233 (URN)10.1007/978-3-031-73464-9_21 (DOI)2-s2.0-85212279211 (Scopus ID)
Konferens
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse
Anmärkning

ISBN for host publication: 978-3-031-73463-2, 978-3-031-73464-9

Tillgänglig från: 2025-01-08 Skapad: 2025-01-08 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
3. LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Öppna denna publikation i ny flik eller fönster >>LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Visa övriga...
2025 (Engelska)Ingår i: Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII / [ed] Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha, Springer Nature, 2025, s. 175-191Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the Möbius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. Möbius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps. 

Ort, förlag, år, upplaga, sidor
Springer Nature, 2025
Serie
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15479
Nyckelord
Perspective Distortion, Robust Representation Learning, Self-supervised Learning
Nationell ämneskategori
Datorgrafik och datorseende
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-111235 (URN)10.1007/978-981-96-0966-6_11 (DOI)2-s2.0-85212922792 (Scopus ID)
Konferens
17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024
Forskningsfinansiär
Knut och Alice Wallenbergs Stiftelse
Anmärkning

ISBN for host publication: 978-981-96-0965-9;

Tillgänglig från: 2025-01-08 Skapad: 2025-01-08 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
4. ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
Öppna denna publikation i ny flik eller fönster >>ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks
Visa övriga...
2025 (Engelska)Ingår i: ASTrA: Adversarial Self-supervised Training with Adaptive-Attacks, 2025Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Existing self-supervised adversarial training (self-AT) methods rely on hand-crafted adversarial attack strategies for PGD attacks, which fail to adapt to the evolving learning dynamics of the model and do not account for instance-specific characteristics of images. This results in sub-optimal adversarial robustness and limits the alignment between clean and adversarial data distributions. To address this, we propose ASTrA (Adversarial Self-supervised Training with Adaptive-Attacks), a novel framework introducing a learnable, self-supervised attack strategy network that autonomously discovers optimal attack parameters through exploration-exploitation in a single training episode. ASTrA leverages a reward mechanism based on contrastive loss, optimized with REINFORCE, enabling adaptive attack strategies without labeled data or additional hyperparameters. We further introduce a mixed contrastive objective to align the distribution of clean and adversarial examples in representation space. ASTrA achieves state-of-the-art results on CIFAR10, CIFAR100, and STL10 while integrating seamlessly as a plug-and-play module for other self-AT methods. ASTrA shows scalability to larger datasets, demonstrates strong semi-supervised performance, and is resilient to robust overfitting, backed by explainability analysis on optimal attack strategies. Project page for source code and other details at https://prakashchhipa.github.io/projects/ASTrA.

Nationell ämneskategori
Datorseende och lärande system Artificiell intelligens
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-111564 (URN)
Konferens
International Conference on Learning Representations (ICLR) 2025
Tillgänglig från: 2025-02-07 Skapad: 2025-02-07 Senast uppdaterad: 2025-04-11
5. Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images
Öppna denna publikation i ny flik eller fönster >>Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images
Visa övriga...
2023 (Engelska)Ingår i: Proceedings: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023), IEEE, 2023, s. 2716-2726Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This work presents a novel self-supervised pre-training method to learn efficient representations without labels on histopathology medical images utilizing magnification factors. Other state-of-the-art works mainly focus on fully supervised learning approaches that rely heavily on human annotations. However, the scarcity of labeled and unlabeled data is a long-standing challenge in histopathology. Currently, representation learning without labels remains unexplored in the histopathology domain. The proposed method, Magnification Prior Contrastive Similarity (MPCS), enables self-supervised learning of representations without labels on small-scale breast cancer dataset BreakHis by exploiting magnification factor, inductive transfer, and reducing human prior. The proposed method matches fully supervised learning state-of-the-art performance in malignancy classification when only 20% of labels are used in fine-tuning and outperform previous works in fully supervised learning settings for three public breast cancer datasets, including BreakHis. Further, It provides initial support for a hypothesis that reducing human-prior leads to efficient representation learning in self-supervision, which will need further investigation. The implementation of this work is available online on GitHub

Ort, förlag, år, upplaga, sidor
IEEE, 2023
Serie
Proceedings IEEE Workshop on Applications of Computer Vision, ISSN 2472-6737, E-ISSN 2642-9381
Nyckelord
self-supervised learning, contrastive learning, representation learning, breast cancer, histopathological images, transfer learning, medical images
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-94845 (URN)10.1109/WACV56688.2023.00274 (DOI)000971500202081 ()2-s2.0-85149049398 (Scopus ID)978-1-6654-9346-8 (ISBN)
Konferens
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2-7, 2023, Waikoloa, Hawaii, USA
Tillgänglig från: 2022-12-14 Skapad: 2022-12-14 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
6. Depth Contrast: Self-supervised Pretraining on 3DPM Images for Mining Material Classification
Öppna denna publikation i ny flik eller fönster >>Depth Contrast: Self-supervised Pretraining on 3DPM Images for Mining Material Classification
Visa övriga...
2022 (Engelska)Ingår i: Computer Vision – ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part VI / [ed] Avidan, S.; Brostow, B.; Cissé, M.; Farinella, G.M.; Hassner, H., Springer Nature, 2022, Vol. VI, s. 212-227Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
Springer Nature, 2022
Serie
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 13666
Nationell ämneskategori
Signalbehandling Datorgrafik och datorseende
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-96937 (URN)10.1007/978-3-031-25082-8_14 (DOI)2-s2.0-85151001747 (Scopus ID)
Konferens
17th European Conference on Computer Vision (ECCV 2022), Tel Aviv, Israel, October 23-27, 2022
Anmärkning

ISBN för värdpublikation: 978-3-031-20067-0, 978-3-031-20068-7

Tillgänglig från: 2023-04-28 Skapad: 2023-04-28 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
7. Functional Knowledge Transfer with Self-supervised Representation Learning
Öppna denna publikation i ny flik eller fönster >>Functional Knowledge Transfer with Self-supervised Representation Learning
Visa övriga...
2023 (Engelska)Ingår i: 2023 IEEE International Conference on Image Processing: Proceedings, IEEE , 2023, s. 3339-3343Konferensbidrag, Publicerat paper (Refereegranskat)
Ort, förlag, år, upplaga, sidor
IEEE, 2023
Serie
Proceedings - International Conference on Image Processing, ISSN 1522-4880
Nationell ämneskategori
Datorgrafik och datorseende Datavetenskap (datalogi)
Forskningsämne
Maskininlärning
Identifikatorer
urn:nbn:se:ltu:diva-103659 (URN)10.1109/ICIP49359.2023.10222142 (DOI)001106821003077 ()2-s2.0-85180766253 (Scopus ID)978-1-7281-9835-4 (ISBN)978-1-7281-9836-1 (ISBN)
Konferens
30th IEEE International Conference on Image Processing, ICIP 2023, October 8-11, 2023, Kuala Lumpur, Malaysia
Tillgänglig från: 2024-01-15 Skapad: 2024-01-15 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

fulltext(222493 kB)2062 nedladdningar
Filinformation
Filnamn FULLTEXT02.pdfFilstorlek 222493 kBChecksumma SHA-512
8a63d0802eae45ad2b295f498e422786db5b12619be23a8bbf7f1320738dd7c8c084df53aaa3684b1989230c069d2f4d4b997eb95cfdc3b2998eb0ea3d67e41d
Typ fulltextMimetyp application/pdf

Person

Chhipa, Prakash Chandra

Sök vidare i DiVA

Av författaren/redaktören
Chhipa, Prakash Chandra
Av organisationen
EISLAB
Datorseende och lärande systemArtificiell intelligens

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 2063 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 21173 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf