Integrating Divergent Models for Gene Mention Tagging

摘要：

Gene mention tagging is a critical step for biomedical text mining. Only when gene and gene product mentions are correctly identified could other more complex tasks, such as, gene normalization and gene-gene interaction extraction, be performed effectively. In this paper, six divergent models are implemented with different machine learning algorithms and dissimilar feature sets. We integrate these models to further improve the tagging performance. Experiments conducted on the datasets of BioCreative II GM task show that our best performing integration model can achieve an F-score of 87.70%, which outperforms most of the state-of-the-art systems. We also apply CRF++ to see if Kuo et al.’s integration algorithm based on likelihood scores and dictionary-filtering portable to another CRF package.

关键词： Tezt Mining Gene Mention Tagging Named Entity Recognition

作者: Lishuang LI Rongpeng ZHOU Degen HUANG Wenping LIAO

作者单位: Dalian University of Technology Dalian, Liaoning, China

会议类型: 国际会议

会议名称: International Conference on Natural Language Processing and Knowledge Engineering(IEEE自然语言处理与知识工程国际会议 IEEE NLP-KE 2009)

会议地点: 大连

会议语种:英文

页码: 1-7

在线出版日期: 2009-09-24（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Integrating Divergent Models for Gene Mention Tagging