Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Recognizable Units in Pashto Language for OCR
Show others and affiliations
2015 (English)In: 13th International Conference on Document Analysis and Recognition, IEEE , 2015, p. 1246-1250Conference paper, Published paper (Refereed)
Abstract [en]

Atomic segmentation of cursive scripts into con- stituent characters is one of the most challenging problems in pattern recognition. To avoid segmentation in cursive script, concrete shapes are considered as recognizable units. Therefore, the objective of this work is to find out the alternate recognizable units in Pashto cursive script. These alternatives are ligatures and primary ligatures. However, we need sound statistical analysis to find the appropriate numbers of ligatures and primary ligatures in Pashto script. In this work, a corpus of 2, 313, 736 Pashto words are extracted from a large scale diversified web sources, and total of 19, 268 unique ligatures have been identified in Pashto cursive script. Analysis shows that only 7000 ligatures represent 91% portion of overall corpus of the Pashto unique words. Similarly, about 7, 681 primary ligatures are also identified which represent the basic shapes of all the ligatures.

Place, publisher, year, edition, pages
IEEE , 2015. p. 1246-1250
Keywords [en]
cur-, ligatures, ocr, pashto, primary ligatures
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:ltu:diva-72207OAI: oai:DiVA.org:ltu-72207DiVA, id: diva2:1271576
Conference
13th International Conference on Document Analysis and Recognition
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2019-09-06

Open Access in DiVA

fulltext(783 kB)5 downloads
File information
File name FULLTEXT01.pdfFile size 783 kBChecksum SHA-512
6379e3173466f299139d06a22a24402b91128363f72b6725aec5541536ff4cbfafc3ec7fa160b23c082741a51661ac9cdd3a7d73e46572429848b732cf6d9923
Type fulltextMimetype application/pdf

Other links

Electronic full text

Search in DiVA

By author/editor
Liwicki, Marcus
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 5 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 54 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf