Start Submission Become a Reviewer

Reading: An Analysis of Asian Language Web Pages

Download

A- A+
dyslexia friendly

Articles

An Analysis of Asian Language Web Pages

Authors:

S Turrance Nandasara ,

University of Colombo School of Computing, Colombo, Sri Lanka, LK
X close

Shigeaki Kodama,

Nagaoka University of Technology, Nagaoka, Niigata, Japan, JP
X close

Chew Yew Choong,

Nagaoka University of Technology, Nagaoka, Niigata, Japan, JP
X close

Rizza Caminero,

Nagaoka University of Technology, Nagaoka, Niigata, Japan, JP
X close

Ahmed Tarcan,

Dicle University, Diyarbakir, 21280, Turkey, TR
X close

Hammam Riza,

IPTEKnet, BPPT, Indonesia, ID
X close

Robin Lee Nagano,

Miskolc University, Miskolc, Hungary, HU
X close

Yoshiki Mikami

Nagaoka University of Technology, Nagaoka, Niigata, Japan, JP
X close

Abstract

This paper gives an overview and an evaluation of Web pages of Asian languages on the Web, in particular of those languages that have not been focused on so far. The authors have collected over 100 million Asian Web pages downloaded from 42 Asian country domains, identified the languages based on N-gram statistics and analyzed their language properties. Primarily the number of pages written in each language measures the presence of a language. The survey reveals that the digital language divide exists at a serious level in the region. The state of multilingualism and the dominating presence of cross-border languages, English in particular, are analyzed. The paper sheds light on script and encoding issues of Asian language texts on the Web. In order to promote language resource collection and sharing, authors have a vision of creating an observation-collection instrument for Asian language resources on the Web. The results of the survey show the feasibility of this vision, and provide us with a better idea of the steps needed to realize that vision.

Keywords: Asian languages, Data Mining, Web Statistics, Language Identification, Standards, Multilingualism, Encoding, Web as Corpus, Digital Language Divide.

doi: 10.4038/icter.v1i1.448

The International Journal on Advances in ICT for Emerging Regions 2008 01 (01) : 12 - 23

How to Cite: Nandasara, S.T. et al., (2009). An Analysis of Asian Language Web Pages. International Journal on Advances in ICT for Emerging Regions (ICTer). 1(1), pp.12–23. DOI: http://doi.org/10.4038/icter.v1i1.448
Published on 26 Mar 2009.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus