Change search
Link to record
Permanent link

Direct link
Saini, Rajkumar, Dr.ORCID iD iconorcid.org/0000-0001-8532-0895
Publications (10 of 44) Show all publications
Das Chakladar, D., Shankar, A., Liwicki, F., Barma, S. & Saini, R. (2025). Attention Dynamics: Estimating Attention Levels of ADHD using Swin Transformer. In: Apostolos Antonacopoulos; Subhasis Chaudhuri; Rama Chellappa; Cheng-Lin Liu; Saumik Bhattacharya; Umapada Pal (Ed.), Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XI. Paper presented at 27th International Conference on Pattern Recognition (ICPR 2024), Kolkata, India, December 1-5, 2024. Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Attention Dynamics: Estimating Attention Levels of ADHD using Swin Transformer
Show others...
2025 (English)In: Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XI / [ed] Apostolos Antonacopoulos; Subhasis Chaudhuri; Rama Chellappa; Cheng-Lin Liu; Saumik Bhattacharya; Umapada Pal, Springer Science and Business Media Deutschland GmbH , 2025Conference paper, Published paper (Refereed)
Abstract [en]

Children diagnosed with Attention-Deficit/Hyperactivity Disorder (ADHD) face many difficulties in maintaining their concentration (in terms of attention levels) and controlling their behaviors. Previous studies have mainly focused on identifying brain regions involved in cognitive processes or classifying ADHD and control subjects. However, the classification of attention levels of ADHD subjects has not yet been explored. Here, a robust Swin Transformer (Swin-T) model is proposed to classify the attention levels of ADHD subjects. The experimental cognitive task ‘Surround suppression’ includes two events: Stim ON and Stim OFF related to the high and low attention levels of a subject. In the proposed framework, ADHD-specific channels are initially identified from input Electroencephalography (EEG). Next, the significant, non-noisy connectivity features are extracted from those channels through the Singular Value Decomposition (SVD) method. Finally, the non-noisy features are passed to the robust Swin-T model for attention-level classification. The proposed model achieves 97.28% classification accuracy with 12 subjects. The robustness of the proposed model leads to potential benefits in EEG-based research and clinical settings, enhancing the reliability of ADHD assessments.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15311
Keywords
ADHD, Electroencephalography, Singular Value Decomposition, Granger causality, Deep learning, Swin Transformer
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111231 (URN)10.1007/978-3-031-78195-7_18 (DOI)2-s2.0-85211926165 (Scopus ID)
Conference
27th International Conference on Pattern Recognition (ICPR 2024), Kolkata, India, December 1-5, 2024
Note

ISBN for host publication: 978-3-031-78194-0,  978-3-031-78195-7;

Available from: 2025-01-09 Created: 2025-01-09 Last updated: 2025-01-09Bibliographically approved
Chippa, M. S., Chhipa, P. C., De, K., Liwicki, M. & Saini, R. (2025). LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion. In: Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha (Ed.), Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII. Paper presented at 17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024 (pp. 175-191). Springer Nature
Open this publication in new window or tab >>LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Show others...
2025 (English)In: Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII / [ed] Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha, Springer Nature, 2025, p. 175-191Conference paper, Published paper (Refereed)
Abstract [en]

Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the Möbius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. Möbius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps. 

Place, publisher, year, edition, pages
Springer Nature, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15479
Keywords
Perspective Distortion, Robust Representation Learning, Self-supervised Learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111235 (URN)10.1007/978-981-96-0966-6_11 (DOI)2-s2.0-85212922792 (Scopus ID)
Conference
17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-981-96-0965-9;

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-08Bibliographically approved
Chhipa, P. C., Chippa, M. S., De, K., Saini, R., Liwicki, M. & Shah, M. (2025). Möbius Transform for Mitigating Perspective Distortions in Representation Learning. In: Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol (Ed.), Computer Vision – ECCV 2024: 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXXIII. Paper presented at 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024 (pp. 345-363). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Show others...
2025 (English)In: Computer Vision – ECCV 2024: 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXXIII / [ed] Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol, Springer Science and Business Media Deutschland GmbH , 2025, p. 345-363Conference paper, Published paper (Refereed)
Abstract [en]

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications—crowd counting, fisheye image recognition, and person re-identification—and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15131
Keywords
Perspective Distortion, Self-supervised Learning, Robust Representation Learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111233 (URN)10.1007/978-3-031-73464-9_21 (DOI)2-s2.0-85212279211 (Scopus ID)
Conference
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-3-031-73463-2, 978-3-031-73464-9

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-08Bibliographically approved
Mokayed, H., Saini, R., Adewumi, O., Alkhaled, L., Backe, B., Shivakumara, P., . . . Hum, Y. C. (2025). Vehicle Detection Performance in Nordic Region. In: Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal (Ed.), Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XXII. Paper presented at 27th International Conference on Pattern Recognition (ICPR 2024), Kolkata, India, December 1-5, 2024 (pp. 62-77). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Vehicle Detection Performance in Nordic Region
Show others...
2025 (English)In: Pattern Recognition: 27th International Conference, ICPR 2024, Kolkata, India, December 1–5, 2024, Proceedings, Part XXII / [ed] Apostolos Antonacopoulos, Subhasis Chaudhuri, Rama Chellappa, Cheng-Lin Liu, Saumik Bhattacharya, Umapada Pal, Springer Science and Business Media Deutschland GmbH , 2025, p. 62-77Conference paper, Published paper (Refereed)
Abstract [en]

This paper addresses the critical challenge of vehicle detection in the harsh winter conditions in the Nordic regions, characterized by heavy snowfall, reduced visibility, and low lighting. Due to their susceptibility to environmental distortions and occlusions, traditional vehicle detection methods have struggled in these adverse conditions. The advanced proposed deep learning architectures brought promise, yet the unique difficulties of detecting vehicles in Nordic winters remain inadequately addressed. This study uses the Nordic Vehicle Dataset (NVD), which contains UAV (unmanned aerial vehicle) images from northern Sweden, to evaluate the performance of state-of-the-art vehicle detection algorithms under challenging weather conditions. Our methodology includes a comprehensive evaluation of single-stage, two-stage, segmentation-based, and transformer-based detectors against the NVD. We propose a series of enhancements tailored to each detection framework, including data augmentation, hyperparameter tuning, transfer learning, and Specifically implementing and enhancing the Detection Transformer (DETR). A novel architecture is proposed that leverages self-attention mechanisms with the help of MSER (maximally stable extremal regions) and RST (Rough Set Theory) to identify and filter the region that model long-range dependencies and complex scene contexts. Our findings not only highlight the limitations of current detection systems in the Nordic environment but also offer promising directions for enhancing these algorithms for improved robustness and accuracy in vehicle detection amidst the complexities of winter landscapes. The code and the dataset are available at https://nvd.ltu-ai.dev.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 15322
Keywords
Vehicle detection, Nordic region, DETR, MSER, Roughset, YOLO (You only look once), Faster-RCNN (regions with convolutional neural networks), SSD (Single Shot MultiBox), U-Net
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111232 (URN)10.1007/978-3-031-78312-8_5 (DOI)2-s2.0-85212264328 (Scopus ID)
Conference
27th International Conference on Pattern Recognition (ICPR 2024), Kolkata, India, December 1-5, 2024
Note

ISBN for host publication: 978-3-031-78311-1, 978-3-031-78312-8

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-09Bibliographically approved
Singh, S., Keserwani, P., Roy, P. P. & Saini, R. (2024). Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model. IEEE Access, 12, 187640-187651
Open this publication in new window or tab >>Better Skeleton Better Readability: Scene Text Image Super-Resolution via Skeleton-Aware Diffusion Model
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 187640-187651Article in journal (Refereed) Published
Abstract [en]

Scene text image super-resolution (STISR) aims to enhance the resolution of text images while simultaneously improving their readability by reducing noise, blur, and other degradations. Existing diffusion-based approaches for STISR primarily rely on text-prior information but often overlook the importance of explicitly modeling the visual structure of the text. In this paper, we propose a novel Skeleton-Aware Diffusion Method (SADM) for STISR, which introduces text skeletons as structural guidance to the diffusion process. The text skeleton serves as a critical visual cue, helping the model to better restore the fine details of text, even in severely degraded low-resolution images. Generating high-quality skeletons from low-resolution scene text is a challenging task due to the inherent blurring and noise present in such images. To tackle this, we introduce a diffusion-based Skeleton Correction Network (SCN), which refines the initial skeletons produced by a convolutional neural network-based skeletonization model. The SCN effectively improves the accuracy of the skeletons, allowing for more precise structural guidance during the diffusion process. Our extensive experiments demonstrate the significant benefits of incorporating skeleton information into the STISR pipeline. The proposed SADM achieves state-of-the-art performance on the TextZoom dataset, with accuracies of 81.4%, 64.9%, and 49.6% on the easy, medium, and hard subsets, respectively, compared to the previous best results by ASTER text recognizer. Through detailed analysis, we also show that improving the quality of skeletons from low-resolution images leads to better super-resolution outcomes and enhances the performance of text recognizers.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Scene text image super-resolution, diffusion model, skeleton networks, text recognition
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111173 (URN)10.1109/ACCESS.2024.3510136 (DOI)001380709600001 ()2-s2.0-85211627017 (Scopus ID)
Note

Validerad;2025;Nivå 2;2025-01-03 (sarsun);

Full text license: CC BY 4.0;

Available from: 2025-01-03 Created: 2025-01-03 Last updated: 2025-01-03Bibliographically approved
Saini, R., Liwicki, M. & Jara-Valera, A. J. (2024). Data Analytics and Artificial Intelligence. In: Sébastien Ziegler, Renáta Radócz, Adrian Quesada Rodriguez, Sara Nieves Matheu Garcia (Ed.), Springer Handbooks: (pp. 427-442). Springer Science and Business Media Deutschland GmbH, Part F3575
Open this publication in new window or tab >>Data Analytics and Artificial Intelligence
2024 (English)In: Springer Handbooks / [ed] Sébastien Ziegler, Renáta Radócz, Adrian Quesada Rodriguez, Sara Nieves Matheu Garcia, Springer Science and Business Media Deutschland GmbH , 2024, Vol. Part F3575, p. 427-442Chapter in book (Other academic)
Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2024
National Category
Computer Sciences Robotics
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111223 (URN)10.1007/978-3-031-39650-2_18 (DOI)2-s2.0-85212114810 (Scopus ID)
Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-01-07
Ali, T., Pratim Roy, P. & Saini, R. (2024). Fast&Focused-Net: Enhancing Small Object Encoding With VDP Layer in Deep Neural Networks. IEEE Access, 12, 130603-130616
Open this publication in new window or tab >>Fast&Focused-Net: Enhancing Small Object Encoding With VDP Layer in Deep Neural Networks
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 130603-130616Article in journal (Refereed) Published
Abstract [en]

In this paper, we introduce Fast&Focused-Net (FFN), a novel deep neural network architecture tailored for efficiently encoding small objects into fixed-length feature vectors. Contrary to conventional Convolutional Neural Networks, FFN employs a series of our newly proposed layer, the Volume-wise Dot Product (VDP) layer, designed to address several inherent limitations of CNNs. Specifically, CNNs exhibit a smaller effective receptive field (ERF) than their theoretical counterparts, limiting their vision span. Additionally, the initial layers in CNNs produce low-dimensional feature vectors, presenting a bottleneck for subsequent learning. Lastly, the computational overhead of CNNs, particularly in capturing diverse image regions by parameter sharing, is significantly high. The VDP layer, at the heart of FFN, aims to remedy these issues by efficiently covering the entire image patch information with reduced computational. Experimental results demonstrate the prowess of FFN in a variety of applications. Our network outperformed state-of-the-art methods for small object classification tasks on datasets such as CIFAR-10, CIFAR-100, STL-10, SVHN-Cropped, and Fashion-MNIST. In the context of larger image classification, when combined with a transformer encoder (ViT), FFN produced competitive results for OpenImages V6, ImageNet-1K, and Places365 datasets. Moreover, the same combination showcased unparalleled performance in text recognition tasks across SVT, IC15, SVTP, and HOST datasets.

Place, publisher, year, edition, pages
IEEE, 2024
National Category
Computer and Information Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-110686 (URN)10.1109/access.2024.3447888 (DOI)001320511300001 ()2-s2.0-85201779997 (Scopus ID)
Note

Validerad;2024;Nivå 2;2024-12-03 (signyg);

Fulltext license: CC BY-NC-ND

Available from: 2024-11-11 Created: 2024-11-11 Last updated: 2024-12-03Bibliographically approved
Upadhyay, R., Phlypo, R., Saini, R. & Liwicki, M. (2024). Less is More: Towards parsimonious multi-task models using structured sparsity. In: Yuejie Chi, Gintare Karolina Dziugaite, Qing Qu, Atlas Wang Wang, Zhihui Zhu (Ed.), Proceedings of Machine Learning Research, PMLR: . Paper presented at 1st Conference on Parsimony and Learning (CPAL 2024), Hongkong, China, January 3-6, 2024 (pp. 590-601). Proceedings of Machine Learning Research, 234
Open this publication in new window or tab >>Less is More: Towards parsimonious multi-task models using structured sparsity
2024 (English)In: Proceedings of Machine Learning Research, PMLR / [ed] Yuejie Chi, Gintare Karolina Dziugaite, Qing Qu, Atlas Wang Wang, Zhihui Zhu, Proceedings of Machine Learning Research , 2024, Vol. 234, p. 590-601Conference paper, Published paper (Refereed)
Abstract [en]

Model sparsification in deep learning promotes simpler, more interpretable models with fewer parameters. This not only reduces the model’s memory footprint and computational needs but also shortens inference time. This work focuses on creating sparse models optimized for multiple tasks with fewer parameters. These parsimonious models also possess the potential to match or outperform dense models in terms of performance. In this work, we introduce channel-wise l1/l2 group sparsity in the shared convolutional layers parameters (or weights) of the multi-task learning model. This approach facilitates the removal of extraneous groups i.e., channels (due to l1 regularization) and also imposes a penalty on the weights, further enhancing the learning efficiency for all tasks (due to l2 regularization). We analyzed the results of group sparsity in both single-task and multi-task settings on two widely-used multi-task learning datasets: NYU-v2 and CelebAMask-HQ. On both datasets, which consist of three different computer vision tasks each, multi-task models with approximately 70% sparsity outperform their dense equivalents. We also investigate how changing the degree of sparsification influences the model’s performance, the overall sparsity percentage, the patterns of sparsity, and the inference time.

Place, publisher, year, edition, pages
Proceedings of Machine Learning Research, 2024
Keywords
Multi-task learning, structured sparsity, group sparsity, parameter pruning, semantic segmentation, depth estimation, surface normal estimation
National Category
Probability Theory and Statistics
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-103838 (URN)2-s2.0-85183883391 (Scopus ID)
Conference
1st Conference on Parsimony and Learning (CPAL 2024), Hongkong, China, January 3-6, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

Copyright © The authors and PMLR 2024. Authors retain copyright.

Available from: 2024-01-29 Created: 2024-01-29 Last updated: 2024-02-19Bibliographically approved
Mishra, A. R., Kumar, R. & Saini, R. (2024). Performance Enhancement of EEG Signatures for Person Authentication Using CNN BiLSTM Method. Journal of universal computer science (Online), 30(12), 1755-1779
Open this publication in new window or tab >>Performance Enhancement of EEG Signatures for Person Authentication Using CNN BiLSTM Method
2024 (English)In: Journal of universal computer science (Online), ISSN 0948-695X, E-ISSN 0948-6968, Vol. 30, no 12, p. 1755-1779Article in journal (Refereed) Published
Abstract [en]

Despite their vulnerability to competent forgers, signatures are one of the most widely used user verification methods. Recent research has revealed that EEG signals are harder to reproduce and give superior biometric information. This study aims to improve the effectiveness of person authentication by using deep learning techniques on electroencephalogram (EEG) signals. The broad implementation of EEG-based authentication systems has been hindered by problems such as noise, variability, and inter-subject variances despite the potential distinctiveness of EEG signals. We propose a multiscale convolutional neural network (CNN) and a Bidirectional LSTM (BiLSTM) model called CNN-BiLSTM to extract features and classify raw EEG data. This methodology involves acquiring raw EEG data, preprocessing for noise reduction, standardization, normalization, and employing deep learning techniques for feature extraction and classification. Experimental results exhibit a notable improvement in accuracy and reliability compared to existing EEG authentication methods such as LOF, CNN, FCN, EfficientNet-B0, and BiLSTM. The results showcase the performance of the proposed deep learning model utilizing established metrics such as precision, sensitivity, specificity, and accuracy. The proposed methodology outperforms existing methods and achieves a training and validation accuracy of 98.9% and 92.2%, respectively. The findings of the research demonstrate that the proposed approach has been successful in achieving highly effective results by using EEG signals for the purpose of resolving issues related to person identification.

Place, publisher, year, edition, pages
IICM, 2024
Keywords
Deep Learning, EEG signals, BiLSTM, CNN, Person Authentication (PA), Classification
National Category
Signal Processing
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111099 (URN)10.3897/jucs.122236 (DOI)001368992000007 ()2-s2.0-85211197428 (Scopus ID)
Note

Full text license: CC BY-NC-ND

Available from: 2024-12-17 Created: 2024-12-17 Last updated: 2025-01-13
Upadhyay, R., Phlypo, R., Saini, R. & Liwicki, M. (2024). Sharing to Learn and Learning to Share; Fitting Together Meta, Multi-Task, and Transfer Learning: A Meta Review. IEEE Access, 12, 148553-148576
Open this publication in new window or tab >>Sharing to Learn and Learning to Share; Fitting Together Meta, Multi-Task, and Transfer Learning: A Meta Review
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 148553-148576Article, review/survey (Refereed) Published
Abstract [en]

Integrating knowledge across different domains is an essential feature of human learning. Learning paradigms such as transfer learning, meta-learning, and multi-task learning reflect the human learning process by exploiting the prior knowledge for new tasks, encouraging faster learning and good generalization for new tasks. This article gives a detailed view of these learning paradigms and their comparative analysis. The weakness of one learning algorithm turns out to be a strength of another, and thus, merging them is a prevalent trait in the literature. Numerous research papers focus on each of these learning paradigms separately and provide a comprehensive overview of them. However, this article reviews research studies that combine (two of) these learning algorithms. This survey describes how these techniques are combined to solve problems in many different fields of research, including computer vision, natural language processing, hyper-spectral imaging, and many more, in a supervised setting only. Based on the knowledge accumulated from the literature, we hypothesize a generic task-agnostic and model-agnostic learning network – an ensemble of meta-learning, transfer learning, and multi-task learning, termed Multi-modal Multi-task Meta Transfer Learning. We also present some open research questions, limitations, and future research directions for this proposed network. The aim of this article is to spark interest among scholars in effectively merging existing learning algorithms with the intention of advancing research in this field. Instead of presenting experimental results, we invite readers to explore and contemplate techniques for merging algorithms while navigating through their limitations.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Knowledge sharing, multi-task learning, meta-learning, multi-modal inputs, transfer learning
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-94817 (URN)10.48550/arXiv.2111.12146 (DOI)
Note

Validerad;2024;Nivå 2;2024-11-11 (joosat);

Full text license: CC BY

Available from: 2022-12-12 Created: 2022-12-12 Last updated: 2024-11-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8532-0895

Search in DiVA

Show all publications