会议专题

Hybrid Topics--Facilitating the Interpretation of Topics Through the Addition of MeSH Descriptors to Bags of Words

  Extracting and understanding information,themes and relationships from large collections of documents is an important task for biomedical researchers.Latent Dirichlet Allocation is an unsupervised topic modeling technique using the bag-of-words assumption that has been applied extensively to unveil hidden thematic information within large sets of documents.In this paper,we added MeSH descriptors to the bag-of-words assumption to generate hybrid topics,which are mixed vectors of words and descriptors.We evaluated this approach on the quality and interpretability of topics in both a general corpus and a specialized corpus.Our results demonstrated that the coherence of hybrid topics is higher than that of regular bag-of-words topics in the specialized corpus.We also found that the proportion of topics that are not associated with MeSH descriptors is higher in the specialized corpus than in the general corpus.

Medical Subject Headings Models,Statistical Data Data Interpretation,Statistical

Zhiguo Yu Thang Nguyen Ferdinand Dhombres Todd Johnson Olivier Bodenreider

The University of Texas of Biomedical Informatics at Houston,Houston,Texas,USA Department of Computer Science,University of Maryland,College Park,Maryland,USA U.S.National Library of Medicine,National Institute of Health,Bethesda,Maryland,USA

国际会议

第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)

苏州

英文

662-666

2017-08-21(万方平台首次上网日期,不代表论文的发表时间)