Detecting Protected Health Information in Heterogeneous Clinical Notes
To enable secondary use of healthcare data in a privacy-preserving manner, there is a need for methods capable of automatically identifying protected health information (PHI) in clinical text. To that end, learning predictive models from labeled examples has emerged as a promising alternative to rule-based systems. However, little is known about differences with respect to PHI prevalence in different types of clinical notes and how potential domain differences may affect the performance of predictive models trained on one particular type of note and applied to another. In this study, we analyze the performance of a predictive model trained on an existing PHI corpus of Swedish clinical notes and applied to a variety of clinical notes: written (i) in different clinical specialties, (ii) under different headings, and (iii) by persons in different professions. The results indicate that domain adaption is needed for effective detection of PHI in heterogeneous clinical notes.
Natural Language Processing Electronic Health Records Data Anonymization
Aron Henriksson Maria Kvist Hercules Dalianis
Department of Computer and Systems Sciences,(DSV),Stockholm University,Sweden Department of Computer and Systems Sciences,(DSV),Stockholm University,Sweden;Department of Laborato
国际会议
第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)
苏州
英文
393-397
2017-08-21(万方平台首次上网日期,不代表论文的发表时间)