Extracting Biomarker Information applying Natural Language Processing and Machine Learning

摘要：

In this paper, we detail an approach to a very specific task of information extraction namely, extracting biomarker information in biomedical literature. Starting with the abstract of a given publication, we first identify the evaluative sentence(s) among other sentences by recognizing words and phrases in the text belonging to semantic categories of interest to bio-medical entities (i.e., semantic category recognition). For the entities like, protein, gene and disease, we determine whether the statement refers to biomarker relationship (i.e., assertion classification). Finally, we identify the biomarker relationship among the biomedical entities (i.e., semantic relationship classification). The system, Biomarker Information Extraction Tool (BIET) implements Machine Learning-based biomarker extraction using support vector machines (SVM). The system is trained and tested on a corpus of oncology related PubMed/MEDLINE literatures hand-annotated with biomarker information. We investigate the effectiveness of different features for this task and examine the amount of training data needed to learn the biomarker relationship with the entities. Our system achieved an average Fscore of 86% for the task of biomarker information extraction comparing to the human annotated dataset (i.e. gold standard) scores.

作者: Md Tawhidul Islam Mostafa Shaikh Abhaya Nayak Shoba Ranganathan

作者单位: Department of Chemistry and Biomolecular Science,Biotechnology Research Institute,Macquarie Universi Dept.Of Information and Comm.Engineering University of Tokyo Tokyo, Japan Department of Computing Macquarie University Sydney Australia Department of Chemistry and Biomolecular Science, Biotechnology Research Institute, Macquarie Univer

会议类型: 国际会议

会议名称: The 4th International Conference on Bioinformatics and Biomedical Engineering(第四届IEEE生物信息与生物医学工程国际会议 iCBBE 2010)

会议地点: 成都

会议语种:英文

页码: 1-4

在线出版日期: 2010-06-18（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Extracting Biomarker Information applying Natural Language Processing and Machine Learning