An Experimental Comparison of Quality Models for Health Data De-Identification
When individual-level health data are shared in biomedical research,the privacy of patients must be protected.This is typically achieved by data de-identification methods,which transform data in such a way that formal privacy requirements are met.In the process,it is important to minimize the loss of information to maintain data quality.Although several models have been proposed for measuring this aspect,it remains unclear which model is best suited for which application.We have therefore performed an extensive experimental comparison.We first implemented several common quality models into the ARX de-identification tool for biomedical data.We then used each model to de-identify a patient discharge dataset covering almost 4 million cases and outputs were analyzed to measure the impact of different quality models on real-world applications.Our results show that different models are best suited for specific applications,but that one model(Non-Uniform Entropy)is particularly well suited for generalpurpose use.
Privacy Personally identifiable information Data anonymization
Johanna Eicher Klaus A.Kuhn Fabian Prasser
Institute of Medical Statistics and Epidemiology,University Hospital rechts der Isar,Technical University of Munich,Germany
国际会议
第十六届世界医药健康信息学大会((MEDINFO2017)、第二届世界医药健康信息学华语论坛(WCHIS 2017)、第15届全国医药信息学大会(CMIA 2017)
苏州
英文
704-708
2017-08-21(万方平台首次上网日期,不代表论文的发表时间)