Incorporating Homology Using Multi-instance Kernel for Protein Subcelluar Localization
Kernel method has witnessed many successful applications in computational biology in recent years, and thus kernel design is a key step to define the similarity between two protein sequences. This paper aims at designing a kernel to derive more accurate similarity between two protein sequences by incorporating homology. Here a homologous sequence is viewed as one evolutionary instance of the target sequence and all homologous sequences constitute one homology bag. K-mer based spectrum kernel is used to define the similarity between any two instances and multi-instance kernel is as the sum of instance-wise spectrum kernels, called homologybased multi-instance kernel (HoMIKernel). By varying k-mer size and compressing 20 amino acids, we can derive several HoMIKernels, which are combined as HoMIKernel+to capture more contextual information and cover size-varying motifs. We evaluate HoMIKernel+on three unbalanced eukaryotic benchmark dataset. The experiments show that HoMIKernel+achieves better predictive performance than the baseline models; and the incorporation of homologous sequences does increase the predictive accuracy.
Suyu Mei Wang Fei
Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science,Fudan Univ Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan Uni
国际会议
成都
英文
1-4
2010-06-18(万方平台首次上网日期,不代表论文的发表时间)