Change search
Link to record
Permanent link

Direct link
Publications (10 of 158) Show all publications
Nikolaidou, K., Retsinas, G., Sfikas, G. & Liwicki, M. (2025). DiffusionPen: Towards Controlling the Style of Handwritten Text Generation. In: Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol (Ed.), Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXV. Paper presented at 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024 (pp. 417-434). Springer Science and Business Media Deutschland GmbH, LXXXV
Open this publication in new window or tab >>DiffusionPen: Towards Controlling the Style of Handwritten Text Generation
2025 (English)In: Computer Vision – ECCV 2024: 18th European Conference, Milan, Italy, September 29–October 4, 2024, Proceedings, Part LXXXV / [ed] Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol, Springer Science and Business Media Deutschland GmbH , 2025, Vol. LXXXV, p. 417-434Conference paper, Published paper (Refereed)
Abstract [en]

Handwritten Text Generation (HTG) conditioned on text and style is a challenging task due to the variability of inter-user characteristics and the unlimited combinations of characters that form new words unseen during training. Diffusion Models have recently shown promising results in HTG but still remain under-explored. We present DiffusionPen (DiffPen), a 5-shot style handwritten text generation approach based on Latent Diffusion Models. By utilizing a hybrid style extractor that combines metric learning and classification, our approach manages to capture both textual and stylistic characteristics of seen and unseen words and styles, generating realistic handwritten samples. Moreover, we explore several variation strategies of the data with multi-style mixtures and noisy embeddings, enhancing the robustness and diversity of the generated data. Extensive experiments using IAM offline handwriting database show that our method outperforms existing methods qualitatively and quantitatively, and its additional generated data can improve the performance of Handwriting Text Recognition (HTR) systems.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15143
Keywords
Handwriting Generation, Latent Diffusion Models, Few-shot Style Representation
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111074 (URN)10.1007/978-3-031-73013-9_24 (DOI)2-s2.0-85211230972 (Scopus ID)
Conference
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024
Note

ISBN for host publication: 978-3-031-73012-2, 978-3-031-73013-9

Available from: 2024-12-17 Created: 2024-12-17 Last updated: 2024-12-17Bibliographically approved
Chippa, M. S., Chhipa, P. C., De, K., Liwicki, M. & Saini, R. (2025). LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion. In: Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha (Ed.), Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII. Paper presented at 17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024 (pp. 175-191). Springer Nature
Open this publication in new window or tab >>LCM: Log Conformal Maps for Robust Representation Learning to Mitigate Perspective Distortion
Show others...
2025 (English)In: Computer Vision – ACCV 2024: 17th Asian Conference on Computer VisionHanoi, Vietnam, December 8–12, 2024 Proceedings, Part VIII / [ed] Minsu Cho; Ivan Laptev; Du Tran; Angela Yao; Hongbin Zha, Springer Nature, 2025, p. 175-191Conference paper, Published paper (Refereed)
Abstract [en]

Perspective distortion (PD) leads to substantial alterations in the shape, size, orientation, angles, and spatial relationships of visual elements in images. Accurately determining camera intrinsic and extrinsic parameters is challenging, making it hard to synthesize perspective distortion effectively. The current distortion correction methods involve removing distortion and learning vision tasks, thus making it a multi-step process, often compromising performance. Recent work leverages the Möbius transform for mitigating perspective distortions (MPD) to synthesize perspective distortions without estimating camera parameters. Möbius transform requires tuning multiple interdependent and interrelated parameters and involving complex arithmetic operations, leading to substantial computational complexity. To address these challenges, we propose Log Conformal Maps (LCM), a method leveraging the logarithmic function to approximate perspective distortions with fewer parameters and reduced computational complexity. We provide a detailed foundation complemented with experiments to demonstrate that LCM with fewer parameters approximates the MPD. We show that LCM integrates well with supervised and self-supervised representation learning, outperform standard models, and matches the state-of-the-art performance in mitigating perspective distortion over multiple benchmarks, namely Imagenet-PD, Imagenet-E, and Imagenet-X. Further LCM demonstrate seamless integration with person re-identification and improved the performance. Source code is made publicly available at https://github.com/meenakshi23/Log-Conformal-Maps. 

Place, publisher, year, edition, pages
Springer Nature, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15479
Keywords
Perspective Distortion, Robust Representation Learning, Self-supervised Learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111235 (URN)10.1007/978-981-96-0966-6_11 (DOI)2-s2.0-85212922792 (Scopus ID)
Conference
17th Asian Conference on Computer Vision (ACCV 2024), Hanoi, Vietnam, December 8-12, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-981-96-0965-9;

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-08Bibliographically approved
Chhipa, P. C., Chippa, M. S., De, K., Saini, R., Liwicki, M. & Shah, M. (2025). Möbius Transform for Mitigating Perspective Distortions in Representation Learning. In: Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol (Ed.), Computer Vision – ECCV 2024: 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXXIII. Paper presented at 18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024 (pp. 345-363). Springer Science and Business Media Deutschland GmbH
Open this publication in new window or tab >>Möbius Transform for Mitigating Perspective Distortions in Representation Learning
Show others...
2025 (English)In: Computer Vision – ECCV 2024: 18th European Conference Milan, Italy, September 29–October 4, 2024 Proceedings, Part LXXIII / [ed] Aleš Leonardis; Elisa Ricci; Stefan Roth; Olga Russakovsky; Torsten Sattler; Gül Varol, Springer Science and Business Media Deutschland GmbH , 2025, p. 345-363Conference paper, Published paper (Refereed)
Abstract [en]

Perspective distortion (PD) causes unprecedented changes in shape, size, orientation, angles, and other spatial relationships of visual concepts in images. Precisely estimating camera intrinsic and extrinsic parameters is a challenging task that prevents synthesizing perspective distortion. Non-availability of dedicated training data poses a critical barrier to developing robust computer vision methods. Additionally, distortion correction methods make other computer vision tasks a multi-step approach and lack performance. In this work, we propose mitigating perspective distortion (MPD) by employing a fine-grained parameter control on a specific family of Möbius transform to model real-world distortion without estimating camera intrinsic and extrinsic parameters and without the need for actual distorted data. Also, we present a dedicated perspectively distorted benchmark dataset, ImageNet-PD, to benchmark the robustness of deep learning models against this new dataset. The proposed method outperforms existing benchmarks, ImageNet-E and ImageNet-X. Additionally, it significantly improves performance on ImageNet-PD while consistently performing on standard data distribution. Notably, our method shows improved performance on three PD-affected real-world applications—crowd counting, fisheye image recognition, and person re-identification—and one PD-affected challenging CV task: object detection. The source code, dataset, and models are available on the project webpage at https://prakashchhipa.github.io/projects/mpd.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2025
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15131
Keywords
Perspective Distortion, Self-supervised Learning, Robust Representation Learning
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111233 (URN)10.1007/978-3-031-73464-9_21 (DOI)2-s2.0-85212279211 (Scopus ID)
Conference
18th European Conference on Computer Vision (ECCV 2024), Milano, Italy, September 29 - October 4, 2024
Funder
Knut and Alice Wallenberg Foundation
Note

ISBN for host publication: 978-3-031-73463-2, 978-3-031-73464-9

Available from: 2025-01-08 Created: 2025-01-08 Last updated: 2025-01-08Bibliographically approved
Belay, B. H., Guyon, I., Mengiste, T., Tilahun, B., Liwicki, M., Tegegne, T. & Egele, R. (2024). A Historical Handwritten Dataset for Ethiopic OCR with Baseline Models and Human-Level Performance. In: Elisa H. Barney Smith; Marcus Liwicki; Liangrui Peng (Ed.), Document Analysis and Recognition, ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024, Proceedings, Part III. Paper presented at 18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024 (pp. 23-38). Springer Science and Business Media Deutschland GmbH, 3
Open this publication in new window or tab >>A Historical Handwritten Dataset for Ethiopic OCR with Baseline Models and Human-Level Performance
Show others...
2024 (English)In: Document Analysis and Recognition, ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024, Proceedings, Part III / [ed] Elisa H. Barney Smith; Marcus Liwicki; Liangrui Peng, Springer Science and Business Media Deutschland GmbH , 2024, Vol. 3, p. 23-38Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a new OCR dataset for historical handwritten Ethiopic script, characterized by a unique syllabic writing system, low-resource availability, and complex orthographic diacritics. The dataset consists of roughly 80,000 annotated text-line images from 1700 pages of 18th to 20th century documents, including a training set with text-line images from the 19th to 20th century and two test sets. One is distributed similarly to the training set with nearly 6,000 text-line images, and the other contains only images from the 18th century manuscripts, with around 16,000 images. The former test set allows us to check baseline performance in the classical IID setting (Independently and Identically Distributed), while the latter addresses a more realistic setting in which the test set is drawn from a different distribution than the training set (Out-Of-Distribution or OOD). Multiple annotators labeled all text-line images for the HHD-Ethiopic dataset, and an expert supervisor double-checked them. We assessed human-level recognition performance and compared it with state-of-the-art (SOTA) OCR models using the Character Error Rate (CER) and Normalized Edit Distance (NED) metrics. Our results show that the model performed comparably to human-level recognition on the 18th century test set and outperformed humans on the IID test set. However, the unique challenges posed by the Ethiopic script, such as detecting complex diacritics, still present difficulties for the models. Our baseline evaluation and dataset will encourage further research on Ethiopic script recognition. The dataset and source code can be accessed at https://github.com/bdu-birhanu/HHD-Ethiopic.

Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14806
Keywords
Historical Ethiopic script, Human-level recognition performance, HHD-Ethiopic, Normalized edit distance, Text recognition
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-110171 (URN)10.1007/978-3-031-70543-4_2 (DOI)001336394400002 ()2-s2.0-85204650159 (Scopus ID)
Conference
18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024
Funder
EU, Horizon 2020, 952215
Note

Funder: ANR Chair of ArtificialIntelligence HUMANIA (ANR-19-CHIA-0022); ChaLearn; ICT4D Research Center of Bahir Dar Institute of Technology;

ISBN for host publication: 978-3-031-70542-7, 978-3-031-70543-4

Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2024-12-12Bibliographically approved
Nilsson, J., Javed, S., Albertsson, K., Delsing, J., Liwicki, M. & Sandin, F. (2024). AI Concepts for System of Systems Dynamic Interoperability. Sensors, 24(9), Article ID 2921.
Open this publication in new window or tab >>AI Concepts for System of Systems Dynamic Interoperability
Show others...
2024 (English)In: Sensors, E-ISSN 1424-8220, Vol. 24, no 9, article id 2921Article in journal (Refereed) Published
Abstract [en]

Interoperability is a central problem in digitization and sos engineering, which concerns the capacity of systems to exchange information and cooperate. The task to dynamically establish interoperability between heterogeneous cps at run-time is a challenging problem. Different aspects of the interoperability problem have been studied in fields such as sos, neural translation, and agent-based systems, but there are no unifying solutions beyond domain-specific standardization efforts. The problem is complicated by the uncertain and variable relations between physical processes and human-centric symbols, which result from, e.g., latent physical degrees of freedom, maintenance, re-configurations, and software updates. Therefore, we surveyed the literature for concepts and methods needed to automatically establish sos with purposeful cps communication, focusing on machine learning and connecting approaches that are not integrated in the present literature. Here, we summarize recent developments relevant to the dynamic interoperability problem, such as representation learning for ontology alignment and inference on heterogeneous linked data; neural networks for transcoding of text and code; concept learning-based reasoning; and emergent communication. We find that there has been a recent interest in deep learning approaches to establishing communication under different assumptions about the environment, language, and nature of the communicating entities. Furthermore, we present examples of architectures and discuss open problems associated with ai-enabled solutions in relation to sos interoperability requirements. Although these developments open new avenues for research, there are still no examples that bridge the concepts necessary to establish dynamic interoperability in complex sos, and realistic testbeds are needed.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
system of systems, dynamic interoperability, AI for cyber-physical systems, representation learning
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Cyber-Physical Systems; Machine Learning
Identifiers
urn:nbn:se:ltu:diva-87246 (URN)10.3390/s24092921 (DOI)001219942200001 ()38733028 (PubMedID)2-s2.0-85192703355 (Scopus ID)
Note

Validerad;2024;Nivå 2;2024-05-03 (joosat);

Funder: European Commission and Arrowhead Tools project (ECSEL JU grant agreement No. 826452);

Full text: CC BY License

Available from: 2021-09-28 Created: 2021-09-28 Last updated: 2024-11-20Bibliographically approved
Saini, R., Liwicki, M. & Jara-Valera, A. J. (2024). Data Analytics and Artificial Intelligence. In: Sébastien Ziegler, Renáta Radócz, Adrian Quesada Rodriguez, Sara Nieves Matheu Garcia (Ed.), Springer Handbooks: (pp. 427-442). Springer Science and Business Media Deutschland GmbH, Part F3575
Open this publication in new window or tab >>Data Analytics and Artificial Intelligence
2024 (English)In: Springer Handbooks / [ed] Sébastien Ziegler, Renáta Radócz, Adrian Quesada Rodriguez, Sara Nieves Matheu Garcia, Springer Science and Business Media Deutschland GmbH , 2024, Vol. Part F3575, p. 427-442Chapter in book (Other academic)
Place, publisher, year, edition, pages
Springer Science and Business Media Deutschland GmbH, 2024
National Category
Computer Sciences Robotics
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-111223 (URN)10.1007/978-3-031-39650-2_18 (DOI)2-s2.0-85212114810 (Scopus ID)
Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-01-07
Pihlgren, G. G., Sandin, F. & Liwicki, M. (2024). Deep Perceptual Similarity is Adaptable to Ambiguous Contexts. In: Tetiana Lutchyn; Adin Ramirez Rivera; Benjamin Ricaud (Ed.), Proceedings of Machine Learning Research, PMLR: Volume 233: Northern Lights Deep Learning Conference, 9-11 January 2024, UiT The Arctic University, Tromsø, Norway. Paper presented at 5th Northern Lights Deep Learning Conference (NLDL 2024), Tromsø, Norway, January 9-11, 2024 (pp. 212-219). Proceedings of Machine Learning Research
Open this publication in new window or tab >>Deep Perceptual Similarity is Adaptable to Ambiguous Contexts
2024 (English)In: Proceedings of Machine Learning Research, PMLR: Volume 233: Northern Lights Deep Learning Conference, 9-11 January 2024, UiT The Arctic University, Tromsø, Norway / [ed] Tetiana Lutchyn; Adin Ramirez Rivera; Benjamin Ricaud, Proceedings of Machine Learning Research , 2024, p. 212-219Conference paper, Published paper (Refereed)
Abstract [en]

This work examines the adaptability of Deep Perceptual Similarity (DPS) metrics to context beyond those that align with average human perception and contexts in which the standard metrics have been shown to perform well. Prior works have shown that DPS metrics are good at estimating human perception of similarity, so-called perceptual similarity. However, it remains unknown whether such metrics can be adapted to other contexts. In this work, DPS metrics are evaluated for their adaptability to different contradictory similarity contexts. Such contexts are created by randomly ranking six image distortions. Metrics are adapted to consider distortions more or less disruptive to similarity depending on their place in the random rankings. This is done by training pretrained CNNs to measure similarity according to given contexts. The adapted metrics are also evaluated on a perceptual similarity dataset to evaluate whether adapting to a ranking affects their prior performance. The findings show that DPS metrics can be adapted with high performance. While the adapted metrics have difficulties with the same contexts as baselines, performance is improved in 99% of cases. Finally, it is shown that the adaption is not significantly detrimental to prior performance on perceptual similarity. The implementation of this work is available online.

Place, publisher, year, edition, pages
Proceedings of Machine Learning Research, 2024
Series
Proceedings of Machine Learning Research, E-ISSN 2640-3498 ; 233
National Category
Computer and Information Sciences Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-105093 (URN)2-s2.0-85189301791 (Scopus ID)
Conference
5th Northern Lights Deep Learning Conference (NLDL 2024), Tromsø, Norway, January 9-11, 2024
Note

Full text license: CC BY 4.0; 

Available from: 2024-04-15 Created: 2024-04-15 Last updated: 2024-04-15Bibliographically approved
Adewumi, O., Gerdes, M., Chaltikyan, G., Fernandes, F., Lindsköld, L., Liwicki, M. & Catta-Preta, M. (2024). DigiHealth-AI: Outcomes of the First Blended Intensive Programme (BIP) on AI for Health – a Cross-Disciplinary Multi-Institutional Short Teaching Course. In: JAIR - Journal of Applied Interdisciplinary Research Special Issue (2024): Proceedings of the DigiHealthDay 2023. Paper presented at DigiHealthDay-2023, International Scientific Symposium, Pfarrkirchen, Germany, Nov 10, 2023 (pp. 75-85). Deggendorf Institute of Technology
Open this publication in new window or tab >>DigiHealth-AI: Outcomes of the First Blended Intensive Programme (BIP) on AI for Health – a Cross-Disciplinary Multi-Institutional Short Teaching Course
Show others...
2024 (English)In: JAIR - Journal of Applied Interdisciplinary Research Special Issue (2024): Proceedings of the DigiHealthDay 2023, Deggendorf Institute of Technology , 2024, p. 75-85Conference paper, Published paper (Refereed)
Abstract [en]

We reflect on the experiences in organizing and implementing a high-quality Blended Intensive Programme (BIP) as a joint international event. A BIP is a short programme that combines physical mobility with a virtual part. The 6-day event, titled “DigiHealth-AI: Practice, Research, Ethics, and Regulation”, was organized in collaboration with partners from five European nations and support from the EU’s ERASMUS+ programme in November 2023. We introduced a new learning method called ProCoT, involving large language models (LLMs), for preventing cheating by students in writing. We designed an online survey of key questions, which was conducted at the beginning and the end of the BIP. The highlights of the survey are as follows: By the end of the BIP, 84% of the respondents agreed that the intended learning outcomes (ILOs) were fulfilled, 100% strongly agreed that artificial intelligence (AI) benefits the healthcare sector, 62% disagree that they are concerned about AI potentially eliminating jobs in the healthcare sector (compared to 57% initially), 60% were concerned about their privacy when using AI, and 56% could identify, at least, two known sources of bias in AI systems (compared to only 43% prior to the BIP). A total of 541 votes were cast by 40 students, who were the respondents. The minimum and maximum numbers of students who answered any particular survey question at a given period are 25 and 40, respectively.

Place, publisher, year, edition, pages
Deggendorf Institute of Technology, 2024
Keywords
Machine learning, healthcare, pedagogy
National Category
Educational Sciences Health Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-110792 (URN)10.25929/dcmwch54 (DOI)
Conference
DigiHealthDay-2023, International Scientific Symposium, Pfarrkirchen, Germany, Nov 10, 2023
Note

Full text license: CC BY-SA 4.0;

Funder: Knut and Alice Wallenberg Foundations; LTU counterpart fund;

Available from: 2024-11-25 Created: 2024-11-25 Last updated: 2024-11-25Bibliographically approved
Barney Smith, E. H., Liwicki, M. & Peng, L. (Eds.). (2024). Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024 Proceedings, Part I. Paper presented at 18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024. Springer Nature
Open this publication in new window or tab >>Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024 Proceedings, Part I
2024 (English)Conference proceedings (editor) (Refereed)
Abstract [en]

This six-volume set LNCS 14804-14809 constitutes the proceedings of the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, held in Athens, Greece, during August 30–September 4, 2024.The total of 144 full papers presented in these proceedings were carefully selected from 263 submissions.The papers reflect topics such as: Document image processing; physical and logical layout analysis; text and symbol recognition; handwriting recognition; document analysis systems; document classification; indexing and retrieval of documents; document synthesis; extracting document semantics; NLP for document understanding; office automation; graphics recognition; human document interaction; document representation modeling and much more.

Place, publisher, year, edition, pages
Springer Nature, 2024. p. 490
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14804
Keywords
Document Analysis Systems, Handwriting Recognition, Scene Text Detection and Recognition, Document Image Processing, Historical Document Analysis, NLP for Document Understanding, Graphics, Diagram, and Math Recognition, Multimedia Document Analysis
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-110207 (URN)10.1007/978-3-031-70533-5 (DOI)978-3-031-70532-8 (ISBN)978-3-031-70533-5 (ISBN)
Conference
18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024
Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2024-10-15Bibliographically approved
Barney Smith, E. H., Liwicki, M. & Peng, L. (Eds.). (2024). Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024 Proceedings, Part II. Paper presented at 18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024. Springer Nature
Open this publication in new window or tab >>Document Analysis and Recognition - ICDAR 2024: 18th International Conference, Athens, Greece, August 30 – September 4, 2024 Proceedings, Part II
2024 (English)Conference proceedings (editor) (Refereed)
Abstract [en]

This six-volume set LNCS 14804-14809 constitutes the proceedings of the 18th International Conference on Document Analysis and Recognition, ICDAR 2024, held in Athens, Greece, during August 30–September 4, 2024.The total of 144 full papers presented in these proceedings were carefully selected from 263 submissions.The papers reflect topics such as: Document image processing; physical and logical layout analysis; text and symbol recognition; handwriting recognition; document analysis systems; document classification; indexing and retrieval of documents; document synthesis; extracting document semantics; NLP for document understanding; office automation; graphics recognition; human document interaction; document representation modeling and much more.

Place, publisher, year, edition, pages
Springer Nature, 2024. p. 446
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14805
Keywords
Document Analysis Systems, Handwriting Recognition, Scene Text Detection and Recognition, Document Image Processing, Historical Document Analysis, NLP for Document Understanding, Graphics, Diagram, and Math Recognition, Multimedia Document Analysis
National Category
Computer Sciences
Research subject
Machine Learning
Identifiers
urn:nbn:se:ltu:diva-110210 (URN)10.1007/978-3-031-70536-6 (DOI)978-3-031-70535-9 (ISBN)978-3-031-70536-6 (ISBN)
Conference
18th International Conference on Document Analysis and Recognition (ICDAR 2024), Athens, Greece, August 30–September 4, 2024
Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2024-10-15Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-4029-6574

Search in DiVA

Show all publications