Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6903-7552
Fraunhofer Heinrich-Hertz-Institut, Berlin, Germany.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
Show others and affiliations
2025 (English)In: Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII / [ed] Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha, Springer Nature, 2025, p. 175-191Conference paper, Published paper (Refereed)
Abstract [en]

Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the Möbius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. Möbius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps. 

Place, publisher, year, edition, pages
Springer Nature, 2025. p. 175-191
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15479
Keywords [en]
Perspective Distortion, Robust Representation Learning, Self-supervised Learning
National Category
Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-111235DOI: 10.1007/978-981-96-0966-6_11Scopus ID: 2-s2.0-85212922792OAI: oai:DiVA.org:ltu-111235DiVA, id: diva2:1925179
Conference
17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-981-96-0965-9;

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Towards Robust and Domain-aware Self-supervised Representation Learning
Open this publication in new window or tab >>Towards Robust and Domain-aware Self-supervised Representation Learning
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Self-supervised Representation Learning has emerged as a powerful paradigm to mitigate the dependency on labeled data by leveraging intrinsic structures within data. While SSL has demonstrated remarkable progress, SSL remains constrained by two fundamental challenges: its vulnerability to distribution shifts and adversarial attacks, and its limited adaptability to domain-specific visual concepts. These challenges stem from the very nature of SSL, where models learn representations through invariance, yet the current formulation of invariance does not explicitly account for environmental distortions or structured variations unique to different domains. As a consequence, SSL models exhibit limited generalization in real-world scenarios, where unprecedented environmental and human-induced factors introduce variations that are not encountered during training, leading to suboptimal representation learning.

This thesis addresses these limitations by introducing a novel conceptual framework that enhances SSL robustness and domain-awareness in a modular and plug-and-play manner. The foundation of this approach lies in the realization that view generation—a core component of Joint Embedding Architecture and Method (JEAM)-based SSL—offers a natural intervention point for achieving both robustness and domain-awareness without disrupting the underlying loss objective. By systematically improving how invariance is enforced within a common conceptual JEAM-SSL framework, this work ensures that robustness against real-world perturbations and adaptability to diverse domains can be achieved as complementary aspects of the same formulation.

One of the critical challenges in real-world representation learning is perspective distortion (PD), an inevitable artifact that alters the geometric relationships within images. Since explicit correction through camera parameters is often impractical, this thesis introduces mathematically grounded transformations—Möbius Transform and Log Conformal Maps—which model the fundamental properties of PD such as non-linearity and conformality. By incorporating these transformations into the view generation process, SSL models achieve improved robustness against perspective-induced variations while retaining the flexibility of standard SSL objectives.

While environmental factors impact representation learning at the sensory level, SSL is also challenged by human-driven manipulations, particularly in the form of adversarial perturbations. Traditional adversarial training in self-supervised settings relies on brute-force perturbation strategies, which fail to adapt dynamically to the model’s evolving representations. To overcome this, this thesis proposes a learnable adversarial attack policy, embedded within the same modular SSL framework, allowing models to refine adversarial perturbations just-in-time. By aligning the adversarial training process with the way invariance is learned, SSL models gain resilience to adversarial manipulations while maintaining their generalization capabilities.

While robustness ensures that models retain meaningful representations across transformations, domain-awareness is essential for learning representations that are tailored to specialized datasets. The conventional augmentation schemes used in SSL are optimized for natural images but do not incorporate domain-specific information, which is essential for capturing meaningful features in specialized datasets such as medical imaging and industrial inspection. This thesis integrates domain-specific information directly into view generation, incorporating magnification factors in histopathology images and depth cues in industrial materials to guide SSL models toward more meaningful representations. By maintaining SSL’s plug-and-play modularity, domain-awareness is seamlessly integrated into the learning process without requiring extensive changes to the underlying framework.

Beyond robustness and domain-awareness, SSL’s capacity to generalize under limited data availability is another crucial aspect of its practical utility. While the contrastive loss formulation is inherently domain-agnostic, its effectiveness often depends on large-scale data. To address this, this thesis explores functional knowledge transfer, where self-supervised and supervised learning are jointly optimized rather than treated as sequential tasks. This joint optimization framework enables SSL representations to dynamically adapt to supervised objectives, improving generalization in data-scarce regimes while preserving the advantages of label-free pre-training.

By advancing view generation within a unified SSL framework, this thesis establishes a structured and scalable foundation for making self-supervised learning both robust and domain-aware. The proposed methodologies significantly enhance SSL’s ability to operate in real-world scenarios, where distribution shifts, adversarial threats, and domain complexities are inevitable. In doing so, this work lays the groundwork for future advancements in adaptive, generalizable, and structured self-supervised representation learning.

Place, publisher, year, edition, pages
Luleå tekniska universitet, 2025
Series
Doctoral thesis / Luleå University of Technology 1 jan 1997 → …, ISSN 1402-1544
Keywords
Self-supervised Representation Learning, Representation Learning, Robustness, Domain-aware, Perspective Distortion, Adversarial Attacks, Medical Imaging, Computer Vision
National Category
Computer Vision and Learning Systems Artificial Intelligence
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111571 (URN)978-91-8048-761-0 (ISBN)978-91-8048-762-7 (ISBN)
Public defence
2025-04-08, A-117, Luleå University of Technology, Luleå, 09:00 (English)
Opponent
Supervisors
Available from: 2025-02-07 Created: 2025-02-07 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Chhipa, Prakash ChandraLiwicki, MarcusSaini, Rajkumar

Search in DiVA

By author/editor
Chippa, Meenakshi SubhashChhipa, Prakash ChandraLiwicki, MarcusSaini, Rajkumar
By organisation
Embedded Internet Systems Lab
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 34 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf