Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Investigating the Effect of Using Synthetic and Semi-synthetic Images for Historical Document Font Classification
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0002-9332-3188
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0001-9604-7193
Pattern Recognition Lab, Friedrich-Alexander-Universität, Erlangen-Nürnberg, Erlangen, Germany.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
2022 (English)In: Document Analysis Systems: 15th IAPR International Workshop, DAS 2022, La Rochelle, France, May 22–25, 2022, Proceedings / [ed] Seiichi Uchida; Elisa Barney; Véronique Eglin, Springer Nature, 2022, p. 613-626Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies the effect of using various data augmentation by synthetization approaches on historical image data, particularly for font classification. Historical document image datasets often lack the appropriate size to train and evaluate deep learning models, motivating data augmentation and synthetic document generation techniques for creating additional data. This work explores the effect of various semi-synthetic and synthetic historical document images, some of which appear as recent trends and others not published yet, on a font classification task. We use 10K patch samples as baseline dataset, derived from the dataset of Early Printed Books with Multiple Font Groups, and increase its size using DocCreator software and Generative Adversarial Networks (GAN). Furthermore, we fine-tune different pre-trained Convolutional Neural Network (CNN) classifiers as a baseline using the original dataset and then compare the performance with the additional semi-synthetic and synthetic images. We further evaluate the performance using additional real samples from the original dataset in the training process. DocCreator, and the additional real samples improve the performance giving the best results. Finally, for the best-performing architecture, we explore different sizes of training sets and examine how the gradual addition of data affects the performance.

Place, publisher, year, edition, pages
Springer Nature, 2022. p. 613-626
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13237
Keywords [en]
Historical document images, Synthetic image generation, Font classification, Convolutional neural networks, Generative Adversarial Networks, Document Image Analysis, Performance evaluation
National Category
Computer graphics and computer vision
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-91458DOI: 10.1007/978-3-031-06555-2_41ISI: 000870314500041Scopus ID: 2-s2.0-85131123625OAI: oai:DiVA.org:ltu-91458DiVA, id: diva2:1673617
Conference
15th IAPR International Workshop on Document Analysis Systems (DAS 2022), La Rochelle, France, May 22-25, 2022
Note

ISBN för värdpublikation: 978-3-031-06554-5, 978-3-031-06555-2

Available from: 2022-06-21 Created: 2022-06-21 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Enabling Deep Document Image Analysis with Generative Models
Open this publication in new window or tab >>Enabling Deep Document Image Analysis with Generative Models
2023 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

Historical documents are a valuable source of cultural knowledge and can provide information about previous events, societies, beliefs, and cultures. They can serve as an excellent source for research in various fields including history, literature, linguistics, and anthropology. Their preservation and analysis pose significant challenges due to the unique characteristics of handwritten scripts, the variability, and the document degradation. With the rise of the Deep Learning era, enormous amounts of annotated data are required to train large models that can efficiently perform tasks on unseen data. Nowadays, digital libraries provide high-quality digitized images for analysis and processing of historical documents. However, collecting and annotating the provided data is an expensive task and requires a lot of expertise from historians and the humanities. Hence, generating synthetic data to enhance the performance of Deep Learning frameworks is a common approach in Computer Vision and, specifically in this thesis, in Document Image Analysis and Recognition (DIAR).

This thesis focuses on leveraging generative models to facilitate DIAR tasks, focusing on historical and handwritten documents, by generating realistic synthetic images that resemble a real distribution and enhance the training of downstream DIAR tasks. The contributions of the thesis include a systematic literature review, a comparison evaluation, and a developed method for handwriting generation.

First, a systematic literature review of existing historical document image datasets, provides summarized information of 65 studies, focusing on different aspects, such as statistics, document type, language, visual, and annotation aspects. The study discusses limitations and promising resources for future research, which refer to the limited dataset size and absence of benchmarks, as well as the lack of standardization in terms of data format and evaluation scheme.

A subsequent contribution is the integration of generated data in a historical document font classification task. Semi-synthetic data are generated with the use of DocCreator, an open-source software, from which different document degradation augmentations are used. A conditional Generative Adversarial Network (GAN) is used to generate fully synthetic data conditioned on a specific sample. The data generated by the two methods areintegrated as additional samples in the training of several Convolutional Neural Networks classifiers and the effect in the performance is examined.

The final contribution of the thesis introduces a new method for generating styled handwritten text images based on Denoising Diffusion Probabilistic Models (DDPM), which is an unexplored method in DIAR. The method manages to capture stylistic and content characteristics of a standard multi-writer handwriting dataset and achieved an improved performance in enhancing writer identification and handwriting text recognition compared to Generative Adversarial Network (GAN)-based methods. The results demonstrate the potential of the generative method for enabling deep document image analysis and pave the way for further research.

As a future direction, this work will aim to progress from generating word images to generating sentence and full document images by conditioning on the content, style, and layout of historical documents. Another future action will be to further extend the proposed method to operate in a few-shot scheme for the writer style condition in order to generate unseen styles. Furthermore, the future work will aim to leverage important features from pre-training with synthetic and real data in order to generalize to historical documents that are a scarce source and adjusting the text encoding parts to different languages and scripts. Finally, the ultimate goal of the future work aims to generate a massive synthetic historical document image database to fill the existing benchmark gap.

Place, publisher, year, edition, pages
Luleå: Luleå University of Technology, 2023
Series
Licentiate thesis / Luleå University of Technology, ISSN 1402-1757
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-96361 (URN)978-91-8048-303-2 (ISBN)978-91-8048-304-9 (ISBN)
Presentation
2023-06-07, A117, Luleå tekniska universitet, Luleå, 10:00 (English)
Opponent
Supervisors
Available from: 2023-04-12 Created: 2023-04-11 Last updated: 2024-03-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Nikolaidou, KonstantinaUpadhyay, RichaLiwicki, Marcus

Search in DiVA

By author/editor
Nikolaidou, KonstantinaUpadhyay, RichaLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 151 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf