EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce DataShow others and affiliations
2022 (English)In: Applied Sciences, E-ISSN 2076-3417, Vol. 12, no 3, article id 1457Article in journal (Refereed) Published
Abstract [en]
Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches are solely based on the inherent visual cues of the document images. In contrast, the multimodal approach co-learns the visual and textual features, and it has proved to be more effective. Nonetheless, these approaches require a huge amount of data. This paper presents a novel approach for document classification that works with a small amount of data and outperforms other approaches. The proposed approach incorporates a hierarchical attention network (HAN) for the textual stream and the EfficientNet-B0 for the image stream. The hierarchical attention network in the textual stream uses dynamic word embedding through fine-tuned BERT. HAN incorporates both the word level and sentence level features. While earlier approaches rely on training on a large corpus (RVL-CDIP), we show that our approach works with a small amount of data (Tobacco-3482). To this end, we trained the neural network at Tobacco-3482 from scratch. Therefore, we outperform the state-of-the-art by obtaining an accuracy of 90.3%. This results in a relative error reduction rate of 7.9%.
Place, publisher, year, edition, pages
MDPI, 2022. Vol. 12, no 3, article id 1457
Keywords [en]
BERT, document image classification, EfficientNet, fine-tuned BERT, hierarchical attention networks, Multimodal, RVL-CDIP, two-stream, Tobacco-3482
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
URN: urn:nbn:se:ltu:diva-89454DOI: 10.3390/app12031457ISI: 000760057200001Scopus ID: 2-s2.0-85123633245OAI: oai:DiVA.org:ltu-89454DiVA, id: diva2:1642613
Note
Validerad;2022;Nivå 2;2022-03-07 (johcin)
2022-03-072022-03-072023-09-05Bibliographically approved