An Experimental Comparison of Quality Models for Health Data De-Identification

摘要：

　　When individual-level health data are shared in biomedical research,the privacy of patients must be protected.This is typically achieved by data de-identification methods,which transform data in such a way that formal privacy requirements are met.In the process,it is important to minimize the loss of information to maintain data quality.Although several models have been proposed for measuring this aspect,it remains unclear which model is best suited for which application.We have therefore performed an extensive experimental comparison.We first implemented several common quality models into the ARX de-identification tool for biomedical data.We then used each model to de-identify a patient discharge dataset covering almost 4 million cases and outputs were analyzed to measure the impact of different quality models on real-world applications.Our results show that different models are best suited for specific applications,but that one model(Non-Uniform Entropy)is particularly well suited for generalpurpose use.

关键词： Privacy Personally identifiable information Data anonymization

作者: Johanna Eicher Klaus A.Kuhn Fabian Prasser

作者单位: Institute of Medical Statistics and Epidemiology,University Hospital rechts der Isar,Technical University of Munich,Germany

会议类型: 国际会议

会议名称: 第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)

会议地点: 苏州

会议语种:英文

页码: 704-708

在线出版日期: 2017-08-21（万方平台首次上网日期，不代表论文的发表时间）

会议专题

An Experimental Comparison of Quality Models for Health Data De-Identification