re-identification of de-identified phi date elements
DESCRIPTION
Presented in the Late Breaking Research Abstracts - Machine Learning in Relation to EMRs session at the American Medical Informatics Associatio (AMIA) 2013 Annual Symposium on 11/20/2013TRANSCRIPT
![Page 1: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/1.jpg)
Inadvertent disclosure of protected health information (PHI) in randomly shifted date elements for de-identification
Tomasz Adamusiak MD PhD
7omasz
![Page 2: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/2.jpg)
There is a high probability that some patients in your de-identified
data sets can have their dates re-identified on subsequent releases
if you randomly shift dates each time
![Page 3: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/3.jpg)
Two methods for de-identification according to HIPAA Privacy Rule
• Expert determination § 164.514(b)(1)
• Safe harbor § 164.514(b)(1)
• Removal of dates -> data useless for research
• Date shifting ≠ removal (not safe harbor)
![Page 4: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/4.jpg)
Patient John Doe De-identified data set 1 Date of birth randomly shifted by +/- 31 days
Time
![Page 5: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/5.jpg)
The same patient In multiple de-identified data sets Date of birth randomly shifted by +/- 31 days
Time
Non-random interval 2*31+1 = 63 days
![Page 6: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/6.jpg)
Time
Can you guess when the real DOB is?
![Page 7: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/7.jpg)
Time
Can you guess when the real DOB is?
![Page 8: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/8.jpg)
Time
In fact we only need two extremes
![Page 9: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/9.jpg)
This probability can be estimated with binomial distribution
Pr = 2𝑛
2𝑝2 1 − 𝑝 𝑛−2
p – probability of shift to one of the extremes, e.g., 1/62 n – number of releases of data
![Page 10: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/10.jpg)
Results
• For a single patient and two data releases the risk is relatively low (0.0005)
• For a hundred patients and ten releases on average two patients can be de-identified
• Larger sets and more releases higher risk
![Page 11: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/11.jpg)
Conclusions
• Stop using random shifts immediately
• Evaluate the risk of disclosure for already released data
• Use a non-random value for the shift (e.g., [SSN digit +1] x 31).
![Page 12: Re-identification of de-identified PHI date elements](https://reader035.vdocuments.us/reader035/viewer/2022081907/5484575ab47959ce0c8b4b7b/html5/thumbnails/12.jpg)
Thank you
• Mary Shimoyama PhD