Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Fourier Feature-based CBAM and Vision Transformer for Text Detection in Drone Images
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India.
Faculty of Computer Science and Information Technology, University of Malaya, Kuala Lumpur, Malaysia.
Computer Vision and Pattern Recognition Unit, Indian Statistical Institute, Kolkata, India.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-6158-3543
Show others and affiliations
2023 (English)In: Document Analysis and Recognition – ICDAR 2023 Workshops, Part II / [ed] Mickael Coustaty & Alicia Fornés, Springer, 2023, p. 257-271Conference paper, Published paper (Refereed)
Abstract [en]

The use of drones for several real-world applications is increasing exponentially, especially for the purpose of monitoring, surveillance, security, etc. Most existing scene text detection methods were developed for normal scene images. This work aims to develop a model for detecting text in drone as well as scene images. To reduce the adverse effects of drone images, we explore the combination of Fourier transform and Convolutional Block Attention Module (CBAM) to enhance the degraded information in the images without affecting high-contrast images. This is because the above combination helps us to extract prominent features which represent text irrespective of degradations. Therefore, the refined features extracted from the Fourier Contouring Network (FCN) are supplied to Vision Transformer, which uses the ResNet50 as a backbone and encoder-decoder for text detection in both drone and scene images. Hence, the model is called Fourier Transform based Transformer. Experimental results on drone datasets and benchmark datasets, namely, Total-Text and ICDAR 2015 of natural scene text detection show the proposed model is effective and outperforms the state-of-the-art models.

Place, publisher, year, edition, pages
Springer, 2023. p. 257-271
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14194
Keywords [en]
Deep learning, Detection transformer, Drone images, Scene text detection, Transformer
National Category
Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-103373DOI: 10.1007/978-3-031-41501-2_18ISI: 001346411900018Scopus ID: 2-s2.0-85173026111OAI: oai:DiVA.org:ltu-103373DiVA, id: diva2:1823743
Conference
17th International Conference on Document Analysis and Recognition (ICDAR 2023), San José, CA, United States, August 21-26, 2023
Note

ISBN for host publication: 978-3-031-41500-5  (print), 978-3-031-41501-2 (electronic);

Funder: Indian Statistical Institute (Technology Innovation Hub, TIH)

Available from: 2024-01-03 Created: 2024-01-03 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Mokayed, HamamLiwicki, Marcus

Search in DiVA

By author/editor
Mokayed, HamamLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 28 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf