System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Toward Semi-Supervised Graphical Object Detection in Document Images
Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany.ORCID iD: 0000-0002-1121-0885
Department of Computer Science, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany.ORCID iD: 0000-0003-0456-6493
German Research Institute for Artificial Intelligence (DFKI), 67663 Kaiserslautern, Germany.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
Show others and affiliations
2022 (English)In: Future Internet, E-ISSN 1999-5903, Vol. 14, no 6, article id 176Article in journal (Refereed) Published
Abstract [en]

The graphical page object detection classifies and localizes objects such as Tables and Figures in a document. As deep learning techniques for object detection become increasingly successful, many supervised deep neural network-based methods have been introduced to recognize graphical objects in documents. However, these models necessitate a substantial amount of labeled data for the training process. This paper presents an end-to-end semi-supervised framework for graphical object detection in scanned document images to address this limitation. Our method is based on a recently proposed Soft Teacher mechanism that examines the effects of small percentage-labeled data on the classification and localization of graphical objects. On both the PubLayNet and the IIIT-AR-13K datasets, the proposed approach outperforms the supervised models by a significant margin in all labeling ratios (1%, 5%, and 10%). Furthermore, the 10% PubLayNet Soft Teacher model improves the average precision of Table, Figure, and List by +5.4,+1.2, and +3.2 points, respectively, with a similar total mAP as the Faster-RCNN baseline. Moreover, our model trained on 10% of IIIT-AR-13K labeled data beats the previous fully supervised method +4.5 points.

Place, publisher, year, edition, pages
MDPI, 2022. Vol. 14, no 6, article id 176
Keywords [en]
graphical page objects, object detection, document image analysis, semi-supervised, soft teacher
National Category
Software Engineering Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-92096DOI: 10.3390/fi14060176ISI: 000816337400001Scopus ID: 2-s2.0-85135256899OAI: oai:DiVA.org:ltu-92096DiVA, id: diva2:1681685
Note

Validerad;2022;Nivå 2;2022-07-07 (sofila);

Funder: The European project INFINITY (grant no. 883293)

Available from: 2022-07-07 Created: 2022-07-07 Last updated: 2023-08-03Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Liwicki, Marcus

Search in DiVA

By author/editor
Kallempudi, GouthamHashmi, Khurram AzeemLiwicki, MarcusAfzal, Muhammad Zeshan
By organisation
Embedded Internet Systems Lab
In the same journal
Future Internet
Software EngineeringComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf