Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-9332-3188
National Technical University of Athens, Athens, Greece.ORCID iD: 0000-0001-6734-3575
University of West Attica, Athens, Greece.ORCID iD: 0000-0002-7305-2886
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
2025 (English)In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXV / [ed] Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol, Springer Science and Business Media Deutschland GmbH , 2025, Vol. LXXXV, p. 417-434Conference paper, Published paper (Refereed)
Abstract [en]

Handwritten Text Generation (HTG) conditioned on text and style is a challenging task due to the variability of inter-user characteristics and the unlimited combinations of characters that form new words unseen during training. Diffusion Models have recently shown promising results in HTG but still remain under-explored. We present DiffusionPen (DiffPen), a 5-shot style handwritten text generation approach based on Latent Diffusion Models. By utilizing a hybrid style extractor that combines metric learning and classification, our approach manages to capture both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Moreover, we explore several variation strategies of the data with multi-style mixtures and noisy embeddings, enhancing the robustness and diversity of the generated data. Extensive experiments using IAM offline handwriting database show that our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH , 2025. Vol. LXXXV, p. 417-434
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15143
Keywords [en]
Handwriting Generation, Latent Diffusion Models, Few-shot Style Representation
National Category
Computer Sciences Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-111074DOI: 10.1007/978-3-031-73013-9_24Scopus ID: 2-s2.0-85211230972OAI: oai:DiVA.org:ltu-111074DiVA, id: diva2:1921818
Conference
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024
Note

ISBN for host publication: 978-3-031-73012-2, 978-3-031-73013-9

Available from: 2024-12-17 Created: 2024-12-17 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nikolaidou, KonstantinaLiwicki, Marcus

Search in DiVA

By author/editor
Nikolaidou, KonstantinaRetsinas, GeorgeSfikas, GiorgosLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer SciencesComputer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 53 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf