Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing CRNN HTR Architectures with Transformer Blocks
National Technical University of Athens, Athens, Greece.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-9332-3188
University of West Attica, Aigaleo, Greece; University of Ioannina, Ioannina, Greece.
2024 (English)In: Document Analysis and Recognition, ICDAR 2024: 18th International Conference, Athens, Greece, August 30–September 4, 2024, Proceedings, Part IV / [ed] Elisa H. Barney Smith; Marcus Liwicki; Liangrui Peng, Springer Science and Business Media Deutschland GmbH , 2024, Vol. 4, p. 425-440Conference paper, Published paper (Refereed)
Abstract [en]

Handwritten Text Recognition (HTR) is a challenging problem that plays an essential role in digitizing and interpreting diverse handwritten documents. While traditional approaches primarily utilize CNN-RNN (CRNN) architectures, recent advancements based on Transformer architectures have demonstrated impressive results in HTR. However, these Transformer-based systems often involve high-parameter configurations and rely extensively on synthetic data. Moreover, they lack focus on efficiently integrating the ability of Transformer modules to grasp contextual relationships within the data. In this paper, we explore a lightweight integration of Transformer modules into existing CRNN frameworks to address the complexities of HTR, aiming to enhance the context of the sequential nature of the task. We present a hybrid CNN image encoder with intermediate MobileViT blocks that effectively combines the different components in a resource-efficient manner. Through extensive experiments and ablation studies, we refine the integration of these modules and demonstrate that our proposed model enhances HTR performance. Our results on the line-level IAM and RIMES datasets suggest that our proposed method achieves competitive performance with significantly fewer parameters and without integrating synthetic data compared to existing systems. 

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH , 2024. Vol. 4, p. 425-440
Series
Lecture Notes in Computer Science, ISSN 1611-3349, E-ISSN 0302-9743
Keywords [en]
Handwriting Text Recognition, Transformer Modules
National Category
Computer Sciences Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-110168DOI: 10.1007/978-3-031-70546-5_25ISI: 001336396200025Scopus ID: 2-s2.0-85204589724OAI: oai:DiVA.org:ltu-110168DiVA, id: diva2:1901965
Conference
18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024
Note

ISBN for host publication: 978-3-031-70545-8, 978-3-031-70546-5

Available from: 2024-09-30 Created: 2024-09-30 Last updated: 2025-10-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nikolaidou, Konstantina

Search in DiVA

By author/editor
Nikolaidou, Konstantina
By organisation
Embedded Internet Systems Lab
Computer SciencesComputer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 71 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf