Centennial Man

Monday, December 18, 2017

Something for my Computer Security class to consider.

Health open data bungle meant Aussies could be identified

Note: this report out of the University of Melbourne is a follow-up study related to a breach disclosed in 2016.

Allie Coyne reports:

Researchers from the University of Melbourne have been able to easily re-identify patients from confidential data released by the federal Health department, without using decryption methods.

Dr Chris Culnane, Dr Benjamin Rubinstein and Dr Vanessa Teague found that de-identified Australian Medicare benefits scheme (MBS) and pharmaceutical benefits scheme (PBS) claims data released to the public in August 2016 can be used to re-identify the patients involved.

Read more on IT News.

[From the article:

The dataset included the de-identified medical billing records of 2.9 million people, or 10 percent of all Australians, from 1984 to 2014. It also included year of birth, gender, and medical events data.

It was published on the department's open data portal. Only supplier and patient IDs were encrypted.

The dataset was removed by the Health department in September 2016, just a month after it was published, after the same researchers pointed out that the practitioner details could be decrypted.

Related: Research report:

Health Data in an Open World

https://arxiv.org/abs/1712.05627

Chris Culnane, Benjamin I. P. Rubinstein, Vanessa Teague

(Submitted on 15 Dec 2017)

With the aim of informing sound policy about data sharing and privacy, we describe successful re-identification of patients in an Australian de-identified open health dataset. As in prior studies of similar datasets, a few mundane facts often suffice to isolate an individual. Some people can be identified by name based on publicly available information. Decreasing the precision of the unit-record level data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility. We also examine the value of related datasets in improving the accuracy and confidence of re-identification. Our re-identifications were performed on a 10% sample dataset, but a related open Australian dataset allows us to infer with high confidence that some individuals in the sample have been correctly re-identified. Finally, we examine the combination of the open datasets with some commercial datasets that are known to exist but are not in our possession. We show that they would further increase the ease of re-identification.

...and I’ll teach my students how to deal with every one of them!

https://www.csoonline.com/article/3242866/security/our-top-7-cyber-security-predictions-for-2018.html

Our top 7 cyber security predictions for 2018