Start Submission Become a Reviewer

Reading: Estimating the Effects of Text Genre, Image Resolution and Algorithmic Complexity needed for...

Download

A- A+
Alt. Display

Articles

Estimating the Effects of Text Genre, Image Resolution and Algorithmic Complexity needed for Sinhala Optical Character Recognition

Authors:

Isuri Anuradha ,

University of Colombo School of Computing, LK
X close

Chamila Liyanage,

University of Colombo School of Computing, LK
X close

Ruvan Weerasinghe

University of Colombo School of Computing, LK
X close

Abstract

While optical character recognition for Latin based scripts have seen near human quality performance, the accuracy for the rounded scripts of South Asia still lags behind. Work on Sinhala OCR has mainly reported on performance on constrained classes of font faces and so been inconclusive. This paper provides a comprehensive series of experiments using conventional machine learning as well as deep learning on texts and font faces of diverse types and in diverse resolutions, in order to present a realistic estimation of the complexity of recognizing the rounded script of Sinhala. While texts of both old and contemporary books can be recognized with over 87% accuracy, those in old newspapers are much harder to recognize owing to poor print quality and resolution.
How to Cite: Anuradha, I., Liyanage, C. and Weerasinghe, R., 2021. Estimating the Effects of Text Genre, Image Resolution and Algorithmic Complexity needed for Sinhala Optical Character Recognition. International Journal on Advances in ICT for Emerging Regions (ICTer), 14(3), pp.43–51. DOI: http://doi.org/10.4038/icter.v14i3.7231
Published on 04 Aug 2021.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus