Detecting Protected Health Information in Heterogeneous Clinical Notes

摘要：

　　To enable secondary use of healthcare data in a privacy-preserving manner, there is a need for methods capable of automatically identifying protected health information (PHI) in clinical text. To that end, learning predictive models from labeled examples has emerged as a promising alternative to rule-based systems. However, little is known about differences with respect to PHI prevalence in different types of clinical notes and how potential domain differences may affect the performance of predictive models trained on one particular type of note and applied to another. In this study, we analyze the performance of a predictive model trained on an existing PHI corpus of Swedish clinical notes and applied to a variety of clinical notes: written (i) in different clinical specialties, (ii) under different headings, and (iii) by persons in different professions. The results indicate that domain adaption is needed for effective detection of PHI in heterogeneous clinical notes.

关键词： Natural Language Processing Electronic Health Records Data Anonymization

作者: Aron Henriksson Maria Kvist Hercules Dalianis

作者单位: Department of Computer and Systems Sciences,(DSV),Stockholm University,Sweden Department of Computer and Systems Sciences,(DSV),Stockholm University,Sweden;Department of Laborato

会议类型: 国际会议

会议名称: 第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)

会议地点: 苏州

会议语种:英文

页码: 393-397

在线出版日期: 2017-08-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

Detecting Protected Health Information in Heterogeneous Clinical Notes