Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Magnification Prior: A Self-Supervised Method for Learning Representations on Breast Cancer Histopathological Images
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6903-7552
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-9604-7193
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-0100-4030
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-8532-0895
Show others and affiliations
2023 (English)In: Proceedings: 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV 2023), IEEE, 2023, p. 2716-2726Conference paper, Published paper (Refereed)
Abstract [en]

This work presents a novel self-supervised pre-training method to learn efficient representations without labels on histopathology medical images utilizing magnification factors. Other state-of-the-art works mainly focus on fully supervised learning approaches that rely heavily on human annotations. However, the scarcity of labeled and unlabeled data is a long-standing challenge in histopathology. Currently, representation learning without labels remains unexplored in the histopathology domain. The proposed method, Magnification Prior Contrastive Similarity (MPCS), enables self-supervised learning of representations without labels on small-scale breast cancer dataset BreakHis by exploiting magnification factor, inductive transfer, and reducing human prior. The proposed method matches fully supervised learning state-of-the-art performance in malignancy classification when only 20% of labels are used in fine-tuning and outperform previous works in fully supervised learning settings for three public breast cancer datasets, including BreakHis. Further, It provides initial support for a hypothesis that reducing human-prior leads to efficient representation learning in self-supervision, which will need further investigation. The implementation of this work is available online on GitHub

Place, publisher, year, edition, pages
IEEE, 2023. p. 2716-2726
Series
Proceedings IEEE Workshop on Applications of Computer Vision, ISSN 2472-6737, E-ISSN 2642-9381
Keywords [en]
self-supervised learning, contrastive learning, representation learning, breast cancer, histopathological images, transfer learning, medical images
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-94845DOI: 10.1109/WACV56688.2023.00274ISI: 000971500202081Scopus ID: 2-s2.0-85149049398ISBN: 978-1-6654-9346-8 (electronic)OAI: oai:DiVA.org:ltu-94845DiVA, id: diva2:1719276
Conference
2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), January 2-7, 2023, Waikoloa, Hawaii, USA
Available from: 2022-12-14 Created: 2022-12-14 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Self-supervised Representation Learning for Visual Domains Beyond Natural Scenes
Open this publication in new window or tab >>Self-supervised Representation Learning for Visual Domains Beyond Natural Scenes
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis investigates the possibility of efficiently adapting self-supervised representation learning on visual domains beyond natural scenes, e.g., medical imagining and non-RGB sensory images. The thesis contributes to i) formalizing the self-supervised representation learning paradigm in a unified conceptual framework and ii) proposing the hypothesis based on supervision signal from data, called data-prior. Method adaptations following the hypothesis demonstrate significant progress in downstream tasks performance on microscopic histopathology and 3-dimensional particle management (3DPM) mining material non-RGB image domains.

Supervised learning has proven to be obtaining higher performance than unsupervised learning on computer vision downstream tasks, e.g., image classification, object detection, etc. However, it imposes limitations due to human supervision. To reduce human supervision, end-to-end learning, i.e., transfer learning, remains proven for fine-tuning tasks but does not leverage unlabeled data. Representation learning in a self-supervised manner has successfully reduced the need for labelled data in the natural language processing and vision domain. Advances in learning effective visual representations without human supervision through a self-supervised learning approach are thought-provoking.

This thesis performs a detailed conceptual analysis, method formalization, and literature study on the recent paradigm of self-supervised representation learning. The study’s primary goal is to identify the common methodological limitations across the various approaches for adaptation to the visual domain beyond natural scenes. The study finds a common component in transformations that generate distorted views for invariant representation learning. A significant outcome of the study suggests this component is closely dependent on human knowledge of the real world around the natural scene, which fits well the visual domain of the natural scenes but remains sub-optimal for other visual domains that are conceptually different.

A hypothesis is proposed to use the supervision signal from data (data-prior) to replace the human-knowledge-driven transformations in self-supervised pretraining to overcome the stated challenge. Two separate visual domains beyond the natural scene are considered to explore the mentioned hypothesis, which is breast cancer microscopic histopathology and 3-dimensional particle management (3DPM) mining material non-RGB image.

The first research paper explores the breast cancer microscopic histopathology images by actualizing the data-prior hypothesis in terms of multiple magnification factors as supervision signal from data, which is available in the microscopic histopathology images public dataset BreakHis. It proposes a self-supervised representation learning method, Magnification Prior Contrastive Similarity, which adapts the contrastive learning approach by replacing the standard image view transformations (augmentations) by utilizing magnification factors. The contributions to the work are multi-folded. It achieves significant performance improvement in the downstream task of malignancy classification during label efficiency and fully supervised settings. Pretrained models show efficient knowledge transfer on two additional public datasets supported by qualitative analysis on representation learning. The second research paper investigates the 3DPM mining material non-RGB image domain where the material’s pixel-mapped reflectance image and height (depth map) are captured. It actualizes the data-prior hypothesis by using depth maps of mining material on the conveyor belt. The proposed method, Depth Contrast, also adapts the contrastive learning method while replacing standard augmentations with depth maps for mining materials. It outperforms material classification over ImageNet transfer learning performance in fully supervised learning settings in fine-tuning and linear evaluation. It also shows consistent improvement in performance during label efficiency.

In summary, the data-prior hypothesis shows one promising direction for optimal adaptations of contrastive learning methods in self-supervision for the visual domain beyond the natural scene. Although, a detailed study on the data-prior hypothesis is required to explore other non-contrastive approaches of recent self-supervised representation learning, including knowledge distillation and information maximization.

Place, publisher, year, edition, pages
Luleå: Luleå tekniska universitet, 2023
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
Keywords
self-supervised learning, representation learning, computer vision, learning with few labels
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-95425 (URN)978-91-8048-258-5 (ISBN)978-91-8048-259-2 (ISBN)
Presentation
2023-03-17, A117, Luleå tekniska universitet, Luleå, 10:00 (English)
Opponent
Supervisors
Available from: 2023-01-30 Created: 2023-01-30 Last updated: 2023-09-05Bibliographically approved
2. Towards Robust and Domain-aware Self-supervised Representation Learning
Open this publication in new window or tab >>Towards Robust and Domain-aware Self-supervised Representation Learning
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Self-supervised representation learning (SSL) has emerged as a fundamental paradigm in representation learning, enabling models to learn meaningful representations without requiring labeled data. Despite its success, SSL remains constrained by two core challenges: (i) lack of robustness against real-world distribution shifts and adversarial perturbations, and (ii) lack of domain-awareness, limiting its usability beyond natural scenes. These limitations arise from the generic invariance assumptions in SSL, which rely on predefined augmentations to learn representations but suffer to generalize when exposed to unseen environmental distortions, adversarial attacks, and domain-specific nuances. Existing SSL approaches—whether contrastive learning, knowledge distillation, or information maximization—do not explicitly account for these factors, making them vulnerable in real-world applications and suboptimal in specialized domains.

This thesis aims to enhance both robustness and domain-awareness in a modular, plug-and-play manner, ensuring that the advancements are applicable across different joint embedding architecture and method (JEAM)-based SSL approaches and adaptable to future developments in SSL. To achieve this, this thesis follows a guiding principle-leveraging invariant representations to improve robustness and domain-awareness in a modular and plug-and-play manner without altering fundamental SSL objectives. This principle guides that improvements can be seamlessly integrated into existing and future SSL approaches.

To systematically address the above-stated core challenges, this thesis begins with a foundational study of SSL approaches, identifying the common schema that underlies different SSL approaches. This unification provides a conceptual view of SSL methods, allowing us to isolate the domain-sensitive and domain-agnostic components across approaches. This conceptual outcome set the stage to establish precisely where improvements are needed to enhance robustness and domain-awareness across methods as current SSL methods fail under real-world challenges.

Next, the thesis conducts a large-scale empirical evaluation of existing SSL methods against relevant robustness benchmarks, uncovering their failures under distribution shifts caused by real-world environmental challenges. This evaluation reveals a significant decline in the robustness performance of existing SSL methods across different SSL approaches. It establishes the fundamental research gap and motivates the advancements introduced in this thesis.

The first advancement focuses on robustness against distribution shifts, particularly geometric distortions such as perspective distortion (PD), which are prevalent in real-world environment but not addressed by existing SSL methods. Since PD introduces nonlinear spatial transformations, standard affine augmentations fail to model these effects, leading to degraded representation stability. To address this, this thesis introduces Möbius-based mitigating perspective distortion (MPD) and log conformal maps (LCM), mathematically grounded transformations that enable robustness without requiring perspective-distorted training data and estimation of camera parameters. These methods are additionally adapted to multiple real-world computer vision applications—including crowd counting, object detection, person re-identification, and fisheye view recognition—showcasing their effectiveness. Further, addressing the non-availability of dedicated perspectively distorted benchmark, ImageNet-PD robustness benchmark is developed to fill the gap.

Beyond environmental challenges, another critical real-world challenge is adversarial attacks. SSL methods are highly susceptible to adversarial attacks, as the learned representations lack perturbation-invariant constraints. Existing adversarial training approaches in SSL rely on brute-force attack strategies, which fail to adapt dynamically. To address this, this thesis introduces adversarial self-supervised training with adaptive-attacks (ASTrA), where attack strategies evolve dynamically based on the model’s learning dynamics and establish a correspondence between attack parameters and training examples, optimizing adversarial perturbations in a learnable manner. Unlike conventional adversarial training, ASTrA ensures robustness while maintaining SSL’s efficiency and scalability.

While robustness, in this thesis, focuses on real-world challenges in natural scenes, domain-awareness focuses on specialized visual domains beyond natural scenes. Standard SSL augmentations are designed for variations in natural scenes, making them ill-suited for specialized fields such as medical imaging and industrial mining material inspection. This thesis introduces domain-awareness in SSL that incorporates domain-specific information into SSL’s view generation process. Particularly, (i) magnification prior contrastive similarity (MPCS) makes learned representations invariant to magnifications for histopathology images by inducing varying magnifications in the view generation process, improving breast cancer recognition. (ii) depth contrast explicitly enforces modality alignment between material images and attained height of materials on conveyor belt, ensuring that the learned representations become aware of physical properties, thereby improving material classification.

Beyond robustness and domain-awareness, SSL’s ability to generalize with limited data is advantageous for its practicality. While the loss objective in SSL is generally domain-agnostic, its effectiveness relies on large-scale data. In this direction, this thesis explores functional knowledge transfer (FKT), where self-supervised and supervised learning objectives are jointly optimized, enabling SSL representations to adapt dynamically to supervised tasks. This approach enhances generalization in low-data regimes.

In conclusion, this thesis provides a foundation for robust and domain-aware self-supervised representation learning in a modular manner, highlighting its applicability to existing and future JEAM-based SSL approaches, which can inherit these advancements and adapt to emerging challenges.

Place, publisher, year, edition, pages
Luleå tekniska universitet, 2025
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Self-supervised Representation Learning, Representation Learning, Robustness, Domain-aware, Perspective Distortion, Adversarial Attacks, Medical Imaging, Computer Vision
National Category
Computer Vision and Learning Systems Artificial Intelligence
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111571 (URN)978-91-8048-761-0 (ISBN)978-91-8048-762-7 (ISBN)
Public defence
2025-04-08, C305, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2025-02-07 Created: 2025-02-07 Last updated: 2025-03-13Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusGitHubarXiv

Authority records

Chhipa, Prakash ChandraUpadhyay, RichaGrund Pihlgren, GustavSaini, RajkumarLiwicki, Marcus

Search in DiVA

By author/editor
Chhipa, Prakash ChandraUpadhyay, RichaGrund Pihlgren, GustavSaini, RajkumarLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 215 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf