会议专题

A Hybrid Algorithm of Minimum Spanning Tree and Nearest Neighbor for Classifying Human Cancers

Classification and prediction of different cancers based on gene expression profiles are important for cancer diagnosis, cancer treatment and medication discovery. The k nearest neighbor algorithm (k-NN) is one easy and efficient machine learning method for cancer classification and the parameter k is crucial. In this paper, we integrate minimum spanning tree (MST) and k nearest neighbor algorithm (k-NN) for cancer classification. The MST is designed for the selection of parameter k and the nearest neighbors for k-NN. Firstly we build a minimum spanning tree (MST) based on Euclidean distance between each two samples for gene expression data only including one unknown class sample. Secondly for unknown class sample in the gene expression data, we find the connected samples and then apply majority vote principle. Thirdly if there are tied votes then we expend the connected samples with the nearest neighbors for unknown class sample until all the samples are expended or the class for unknown sample is obtained. This hybrid algorithm is referred to as MSTNN. The hybrid algorithm MSTNN is compared with k-NN and other 3 existing classification algorithms on CNS dataset, Colon dataset and Lung dataset, 3 binary class gene expression datasets and 3 multi-class gene expression datasets: Leukemia1, Leukemia2, and Leukemia3 involving human cancers. The MSTNN algorithm improves 5.65% better than k-NN on average LOOCV accuracy and 13.80% better than k-NN on testing datasets classification average accuracy, and achieves the best performance in all the 5 algorithms. The results demonstrate that the proposed MSTNN algorithm is feasible to classify human cancers.

minimum spanning tree k-nearest-neighbor cancer classification gene expression profile

Chunbao Zhou Liming Wan Yanchun Liang

College of Computer Science andTechnology, Jilin University, KeyLaboratory of Symbol Computationand Research Institute on GeneralDevelopment and Evaluation ofEquipment, EAAF of PLABeijing 100076, Chin College of Computer Science and Technology, Jilin University, Key Laboratory of Symbol Computation a

国际会议

2010 3rd International Conference on Advanced Computer Theory and Engineering(2010年第三届先进计算机理论与工程国际会议 ICACTE 2010)

成都

英文

1-5

2010-08-20(万方平台首次上网日期,不代表论文的发表时间)