Start Submission Become a Reviewer

Reading: Evolutionary k-nearest neighbor imputation algorithm for gene expression data

Download

A- A+
Alt. Display

Articles

Evolutionary k-nearest neighbor imputation algorithm for gene expression data

Authors:

Hiroshi De Silva ,

Department of Computer Science and Engineering, LK
X close

A. Shehan Perera

Department of Computer Science and Engineering, LK
X close

Abstract

Large data sets are produced by the gene expression process which is done by using the DNA microarray technology. These gene expression data are recognized as a common data source which contains missing expression values. In this paper, we present a genetic algorithm optimized k- Nearest neighbor algorithm (Evolutionary kNNImputation) for missing data imputation. Despite the common imputation methods this paper addresses the effectiveness of using supervised learning algorithms for missing data imputation. Missing data imputation approaches can be categorized into four main categories and among the four approaches, our focus is mainly on local approach where the proposed Evolutionary k- Nearest Neighbor Imputation Algorithm falls in. The Evolutionary k- Nearest Neighbor Imputation Algorithm is an extension of the common k- nearest Neighbor Imputation Algorithm which the genetic algorithm is used to optimize some parameters of k- Nearest Neighbor Algorithm. The selection of similarity matrix and the selection of the parameter value k can be identified as the optimization problem. We have compared the proposed Evolutionary k- Nearest Neighbor Imputation algorithm with k- Nearest Neighbor Imputation algorithm and mean imputation method. The three algorithms were tested using gene expression datasets. Certain percentages of values are randomly deleted in the datasets and recovered the missing values using the three algorithms. Results show that Evolutionary kNNImputation outperforms kNNImputation and mean imputation while showing the importance of using a supervised learning algorithm in missing data estimation. Even though mean imputation happened to show low mean error for a very few missing rates, supervised learning algorithms became effective when it comes to higher missing rates in datasets which is the most common situation among datasets.
How to Cite: De Silva, H. and Perera, A.S., 2017. Evolutionary k-nearest neighbor imputation algorithm for gene expression data. International Journal on Advances in ICT for Emerging Regions (ICTer), 10(1), pp.11–18. DOI: http://doi.org/10.4038/icter.v10i1.7183
Published on 04 May 2017.
Peer Reviewed

Downloads

  • PDF (EN)

    comments powered by Disqus