Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rethinking Learnable Proposals for Graphical Object Detection in Scanned Document Images
Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, 67663, Germany.ORCID iD: 0000-0002-9456-213x
Department of Computer Science, Technical University of Kaiserslautern, Kaiserslautern, 67663, Germany; Mindgarage, Technical University of Kaiserslautern, 67663 Kaiserslautern, Germany; German Research Institute for Artificial Intelligence (DFKI), Kaiserslautern, 67663, Germany.
German Research Institute for Artificial Intelligence (DFKI), Kaiserslautern, 67663, Germany.
Luleå University of Technology, Department of Computer Science, Electrical and Space Engineering, Embedded Internet Systems Lab.ORCID iD: 0000-0003-4029-6574
Show others and affiliations
2022 (English)In: Applied Sciences, E-ISSN 2076-3417, Vol. 12, no 20, article id 10578Article in journal (Refereed) Published
Abstract [en]

In the age of deep learning, researchers have looked at domain adaptation under the pre-training and fine-tuning paradigm to leverage the gains in the natural image domain. These backbones and subsequent networks are designed for object detection in the natural image domain. They do not consider some of the critical characteristics of document images. Document images are sparse in contextual information, and the graphical page objects are logically clustered. This paper investigates the effectiveness of deep and robust backbones in the document image domain. Further, it explores the idea of learnable object proposals through Sparse R-CNN. This paper shows that simple domain adaptation of top-performing object detectors to the document image domain does not lead to better results. Furthermore, empirically showing that detectors based on dense object priors like Faster R-CNN, Mask R-CNN, and Cascade Mask R-CNN are perhaps not best suited for graphical page object detection. Detectors that reduce the number of object candidates while making them learnable are a step towards a better approach. We formulate and evaluate the Sparse R-CNN (SR-CNN) model on the IIIT-AR-13k, PubLayNet, and DocBank datasets and hope to inspire a rethinking of object proposals in the domain of graphical page object detection.

Place, publisher, year, edition, pages
MDPI , 2022. Vol. 12, no 20, article id 10578
Keywords [en]
computer vision, deep learning, document image analysis, graphical page object detection, proposals
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-93783DOI: 10.3390/app122010578ISI: 000872327300001Scopus ID: 2-s2.0-85140430011OAI: oai:DiVA.org:ltu-93783DiVA, id: diva2:1708974
Projects
INFINITY
Funder
EU, Horizon 2020, 883293
Note

Validerad;2022;Nivå 2;2022-11-07 (hanlid)

Available from: 2022-11-07 Created: 2022-11-07 Last updated: 2022-11-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Liwicki, Marcus

Search in DiVA

By author/editor
Sinha, SankalpLiwicki, Marcus
By organisation
Embedded Internet Systems Lab
In the same journal
Applied Sciences
Computer Vision and Robotics (Autonomous Systems)Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf