Investigating Attention Mechanism for Page Object Detection in Document ImagesShow others and affiliations
2022 (English)In: Applied Sciences, E-ISSN 2076-3417, Vol. 12, no 15, article id 7486Article in journal (Refereed) Published
Abstract [en]
Page object detection in scanned document images is a complex task due to varying document layouts and diverse page objects. In the past, traditional methods such as Optical Character Recognition (OCR)-based techniques have been employed to extract textual information. However, these methods fail to comprehend complex page objects such as tables and figures. This paper addresses the localization problem and classification of graphical objects that visually summarize vital information in documents. Furthermore, this work examines the benefit of incorporating attention mechanisms in different object detection networks to perform page object detection on scanned document images. The model is designed with a Pytorch-based framework called Detectron2. The proposed pipelines can be optimized end-to-end and exhaustively evaluated on publicly available datasets such as DocBank, PublayNet, and IIIT-AR-13K. The achieved results reflect the effectiveness of incorporating the attention mechanism for page object detection in documents.
Place, publisher, year, edition, pages
MDPI, 2022. Vol. 12, no 15, article id 7486
Keywords [en]
attention mechanism, page object detection, transfer learning, document image analysis
National Category
Software Engineering
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-92804DOI: 10.3390/app12157486ISI: 000839264500001Scopus ID: 2-s2.0-85136993776OAI: oai:DiVA.org:ltu-92804DiVA, id: diva2:1693014
Funder
EU, Horizon 2020, 883293 INFINITY
Note
Validerad;2022;Nivå 2;2022-09-05 (hanlid)
2022-09-052022-09-052022-09-12Bibliographically approved