Driftstörningar
Just nu har vi driftstörningar på sök-portalerna på grund av hög belastning. Vi arbetar på att lösa problemet, ni kan tillfälligt mötas av ett felmeddelande.
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A Blended Attention-CTC Network Architecture for Amharic Text-image Recognition
Bahir Dar Institute of Technology, Bahir Dar, Ethiopia.
Technical University of Kaiserslautern, Kaiserslautern, Germany.
Luleå tekniska universitet, Institutionen för system- och rymdteknik, EISLAB.ORCID-id: 0000-0003-4029-6574
DFKI, Augmented Vision Department, Kaiserslautern, Germany.
Visa övriga samt affilieringar
2021 (Engelska)Ingår i: Proceedings of the 10th International Conference on Pattern Recognition Applications and Methods (ICPRAM), SciTePress, 2021, s. 435-441Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In this paper, we propose a blended Attention-Connectionist Temporal Classification (CTC) network architecture for a unique script, Amharic, text-image recognition. Amharic is an indigenous Ethiopic script that uses 34 consonant characters with their 7 vowel variants of each and 50 labialized characters which are derived, with a small change, from the 34 consonant characters. The change involves modifying the structure of these characters by adding a straight line, or shortening and/or elongating one of its main legs including the addition of small diacritics to the right, left, top or bottom of the character. Such a small change affects orthographic identities of character and results in shape similarly among characters which are interesting, but challenging task, for OCR research. Motivated with the recent success of attention mechanism on neural machine translation tasks, we propose an attention-based CTC approach which is designed by blending attention mechanism directly within the CTC network. The proposed model consists of an encoder module, attention module and transcription module in a unified framework. The efficacy of the proposed model on the Amharic language shows that attention mechanism allows learning powerful representations by integrating information from different time steps. Our method outperforms state-of-the-art methods and achieves 1.04% and 0.93% of the character error rate on ADOCR test datasets.

Ort, förlag, år, upplaga, sidor
SciTePress, 2021. s. 435-441
Nyckelord [en]
Amharic Script, Blended Attention-CTC, BLSTM, CNN, Encoder-decoder, Network Architecture, OCR, Pattern Recognition
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
Maskininlärning
Identifikatorer
URN: urn:nbn:se:ltu:diva-86383DOI: 10.5220/0010284204350441ISI: 000662835900050Scopus ID: 2-s2.0-85103829482OAI: oai:DiVA.org:ltu-86383DiVA, id: diva2:1580729
Konferens
10th International Conference on Pattern Recognition Applications and Methods, ICPRAM 2021, Online Streaming, February 4-6, 2021
Anmärkning

ISBN för värdpublikation: 978-989-758-486-2

Tillgänglig från: 2021-07-15 Skapad: 2021-07-15 Senast uppdaterad: 2022-12-19Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Liwicki, Marcus

Sök vidare i DiVA

Av författaren/redaktören
Liwicki, Marcus
Av organisationen
EISLAB
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 166 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf