Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Depth Contrast: Self-Supervised Pretraining on 3DPM Images for Mining Material Classification
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-6903-7552
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-9604-7193
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-8532-0895
Optimation Advanced Measurements AB, Luleå, Sweden.
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

This work presents a novel self-supervised representation learning method to learn efficient representations without labels on images from a 3DPM sensor (3-Dimensional Particle Measurement; estimates the particle size distribution of material) utilizing RGB images and depth maps of mining material on the conveyor belt. Human annotations for material categories on sensor-generated data are scarce and cost-intensive. Currently, representation learning without human annotations remains unexplored for mining materials and does not leverage on utilization of sensor-generated data. The proposed method, Depth Contrast, enables self-supervised learning of representations without labels on the 3DPM dataset by exploiting depth maps and inductive transfer. The proposed method outperforms material classification over ImageNet transfer learning performance in fully supervised learning settings and achieves an F1 score of 0.73. Further, The proposed method yields an F1 score of 0.65 with an 11% improvement over ImageNet transfer learning performance in a semi-supervised setting when only 20% of labels are used in fine-tuning. Finally, the Proposed method showcases improved performance generalization on linear evaluation. The implementation of proposed method is available on GitHub. 

National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-94847DOI: 10.48550/arXiv.2210.10633OAI: oai:DiVA.org:ltu-94847DiVA, id: diva2:1719284
Note

Accepted to CVF European Conference on Computer Vision Workshop (ECCVW 2022), Tel-Aviv, Israel

Available from: 2022-12-14 Created: 2022-12-14 Last updated: 2023-09-05Bibliographically approved
In thesis
1. Self-supervised Representation Learning for Visual Domains Beyond Natural Scenes
Open this publication in new window or tab >>Self-supervised Representation Learning for Visual Domains Beyond Natural Scenes
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis investigates the possibility of efficiently adapting self-supervised representation learning on visual domains beyond natural scenes, e.g., medical imagining and non-RGB sensory images. The thesis contributes to i) formalizing the self-supervised representation learning paradigm in a unified conceptual framework and ii) proposing the hypothesis based on supervision signal from data, called data-prior. Method adaptations following the hypothesis demonstrate significant progress in downstream tasks performance on microscopic histopathology and 3-dimensional particle management (3DPM) mining material non-RGB image domains.

Supervised learning has proven to be obtaining higher performance than unsupervised learning on computer vision downstream tasks, e.g., image classification, object detection, etc. However, it imposes limitations due to human supervision. To reduce human supervision, end-to-end learning, i.e., transfer learning, remains proven for fine-tuning tasks but does not leverage unlabeled data. Representation learning in a self-supervised manner has successfully reduced the need for labelled data in the natural language processing and vision domain. Advances in learning effective visual representations without human supervision through a self-supervised learning approach are thought-provoking.

This thesis performs a detailed conceptual analysis, method formalization, and literature study on the recent paradigm of self-supervised representation learning. The study’s primary goal is to identify the common methodological limitations across the various approaches for adaptation to the visual domain beyond natural scenes. The study finds a common component in transformations that generate distorted views for invariant representation learning. A significant outcome of the study suggests this component is closely dependent on human knowledge of the real world around the natural scene, which fits well the visual domain of the natural scenes but remains sub-optimal for other visual domains that are conceptually different.

A hypothesis is proposed to use the supervision signal from data (data-prior) to replace the human-knowledge-driven transformations in self-supervised pretraining to overcome the stated challenge. Two separate visual domains beyond the natural scene are considered to explore the mentioned hypothesis, which is breast cancer microscopic histopathology and 3-dimensional particle management (3DPM) mining material non-RGB image.

The first research paper explores the breast cancer microscopic histopathology images by actualizing the data-prior hypothesis in terms of multiple magnification factors as supervision signal from data, which is available in the microscopic histopathology images public dataset BreakHis. It proposes a self-supervised representation learning method, Magnification Prior Contrastive Similarity, which adapts the contrastive learning approach by replacing the standard image view transformations (augmentations) by utilizing magnification factors. The contributions to the work are multi-folded. It achieves significant performance improvement in the downstream task of malignancy classification during label efficiency and fully supervised settings. Pretrained models show efficient knowledge transfer on two additional public datasets supported by qualitative analysis on representation learning. The second research paper investigates the 3DPM mining material non-RGB image domain where the material’s pixel-mapped reflectance image and height (depth map) are captured. It actualizes the data-prior hypothesis by using depth maps of mining material on the conveyor belt. The proposed method, Depth Contrast, also adapts the contrastive learning method while replacing standard augmentations with depth maps for mining materials. It outperforms material classification over ImageNet transfer learning performance in fully supervised learning settings in fine-tuning and linear evaluation. It also shows consistent improvement in performance during label efficiency.

In summary, the data-prior hypothesis shows one promising direction for optimal adaptations of contrastive learning methods in self-supervision for the visual domain beyond the natural scene. Although, a detailed study on the data-prior hypothesis is required to explore other non-contrastive approaches of recent self-supervised representation learning, including knowledge distillation and information maximization.

Place, publisher, year, edition, pages
Luleå: Luleå tekniska universitet, 2023
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
Keywords
self-supervised learning, representation learning, computer vision, learning with few labels
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-95425 (URN)978-91-8048-258-5 (ISBN)978-91-8048-259-2 (ISBN)
Presentation
2023-03-17, A117, Luleå tekniska universitet, Luleå, 10:00 (English)
Opponent
Supervisors
Available from: 2023-01-30 Created: 2023-01-30 Last updated: 2023-09-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Chhipa, Prakash ChandraUpadhyay, RichaSaini, RajkumarLiwicki, Marcus

Search in DiVA

By author/editor
Chhipa, Prakash ChandraUpadhyay, RichaSaini, RajkumarLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 121 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf