Start Submission Become a Reviewer

Reading: Automatic syllable segmentation of Myanmar texts using finite state transducer

Download

A- A+
dyslexia friendly

Articles

Automatic syllable segmentation of Myanmar texts using finite state transducer

Authors:

Tin Htay Hlaing ,

Nagaoka University of Technology, JAPAN, JP
X close

Yoshiki Mikami

Nagaoka University of Technology, JAPAN
X close

Abstract

Automatic syllabification lies at the heart of script processing especially for the South East Asian scripts like Myanmar. Myanmar syllabification algorithms implemented so far are either rule-based or dictionary-based approach. This paper proposes a new method for Myanmar syllabification which deploys formal grammar and un-weighted finite state transducers (FST). Our proposed method focuses on orthographic way of syllabification for the input texts encoded in Unicode. We tackle syllabification of Myanmar words with standard syllable structure as well as words with irregular structures such as kinzi, consonant stacking which have not been resolved by previous methods. Our FST based syllabifier was tested on 11,732 distinct words contained in Myanmar Orthography Corpus. These words yielded 32,238 syllables and are compared with correctly hand syllabified words. Our FST based syllabification method performs with99.93% accuracy on Stuttgart FST (SFST) tools.

 

DOI: http://dx.doi.org/10.4038/icter.v6i2.7150

International Journal on Advances in ICT for Emerging Regions (ICTer), 2013;v.6(2)

How to Cite: Hlaing, T.H. & Mikami, Y., (2014). Automatic syllable segmentation of Myanmar texts using finite state transducer. International Journal on Advances in ICT for Emerging Regions (ICTer). 6(2). DOI: http://doi.org/10.4038/icter.v6i2.7150
Published on 16 Jul 2014.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus