Improving Medical Ontology Based on Word Embedding
Medical ontology learning or improving is automatically learning the knowledge in ontology format from medical data, mainly text data.With the rise of the word vector space, improving ontology using word embedding has become a hot spot.Most of previous studies have focused on how to acquire different ontological elements using all kinds of learning technologies.Few studies focus on the prior knowledge in a given ontology.In essence,ontology learning or improving is still a learning process based on existing samples.So, the type and number of knowledge acquired is limited by existing samples in a given ontology.This paper firstly formalizes several kinds of prior knowledge for classes in a given ontology.Then we propose a method, named improving medical ontology based on word embeddings (IMO-WE), to enrich different types of knowledge from medical text according to characteristics of different types of prior knowledge.At last, the paper collects the PubMed Central (PMC) data and the PHARE ontology, and finishes a series of experiments to evaluate the IMO-WE.The experimental results yield the following conclusions.The first one is that the data-rich model can achieve higher accuracy for the IMO-WE under same setting in training progress.So, collecting and training big medical data is a viable way to learn more useful knowledge, The second one is that the IMO-WE can be used to improving ontology knowledge when medical data is sufficiently abundant and the ontology has appropriate prior knowledge.Moreover, in the task of improving synonymous labels through similarity distance, the accuracy of IMO-WE is significantly better than that of the Random indexing method.
word embedding medical ontology improving,prior knowledge
Mingxia Gao Furong Chen Rifeng Wang
College of Computer Science and Technology, Beijing University of Technology, Beijing 100124, China TravelSky Technology Limited, Beijing,China Guangxi University of Science and Technology, Liuzhou.P.R.China
国际会议
成都
英文
121-127
2018-03-12(万方平台首次上网日期,不代表论文的发表时间)