A Hybrid Algorithm of Minimum Spanning Tree and Nearest Neighbor for Classifying Human Cancers
Classification and prediction of different cancers based on gene expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. The k nearest neighbor algorithm (k-NN) is one easy and efficient machine learning method for cancer classification and the parameter k is crucial. In this paper, we integrate minimum spanning tree (MST) and k nearest neighbor algorithm (k-NN) for cancer classification. The MST is designed for the selection of parameter k and the nearest neighbors for k-NN. Firstly we build a minimum spanning tree (MST) based on Euclidean distance between each two samples for gene expression data only including one unknown class sample. Secondly for unknown class sample in the gene expression data, we find the connected samples and then apply majority vote principle. Thirdly if there are tied votes then we expend the connected samples with the nearest neighbors for unknown class sample until all the samples are expended or the class for unknown sample is obtained. This hybrid algorithm is referred to as MSTNN. The hybrid algorithm MSTNN is compared with k-NN and other 3 existing classification algorithms on CNS dataset, Colon dataset and Lung dataset, 3 binary class gene expression datasets and 3 multi-class gene expression datasets: Leukemia1, Leukemia2, and Leukemia3 involving human cancers. The MSTNN algorithm improves 5.65% better than k-NN on average LOOCV accuracy and 13.80% better than k-NN on testing datasets classification average accuracy, and achieves the best performance in all the 5 algorithms. The results demonstrate that the proposed MSTNN algorithm is feasible to classify human cancers.
minimum spanning tree k-nearest-neighbor cancer classification gene expression profile
Chunbao Zhou Liming Wan Yanchun Liang
College of Computer Science andTechnology, Jilin University, KeyLaboratory of Symbol Computationand Research Institute on GeneralDevelopment and Evaluation ofEquipment, EAAF of PLABeijing 100076, Chin College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation a
国际会议
成都
英文
1-5
2010-08-20(万方平台首次上网日期,不代表论文的发表时间)