‘Anonymized’ X-ray datasets can reveal patient identities

March 17, 2021

Join Transform 2021 for the most important themes in enterprise AI & Data. Learn more.

Chest X-rays are used around the world to screen for diseases from pneumonia to COPD. But while they play a critical role in clinical care, discovering certain abnormalities in X-rays can be a challenging task for radiologists. That’s given rise to AI-powered, X-ray analyzing disease classification systems, some of which have demonstrated promising performance. However, these systems require a large amount of patient data from which to learn to make diagnoses, which can have frightening privacy implications if the data isn’t properly anonymized.

A study coauthored by researchers at the University Erlangen-Nurnberg in Erlangen, Germany sought to determine the extent to which patient data could be compromised by an X-ray classification system. Drawing on a public dataset of over 112,000 chest x-rays, they developed a technique — a deep learning-based reidentification model — that can identify whether two X-ray images are from the same person with 95.55% accuracy, suggesting that at least some datasets are vulnerable to attack.

As the researchers note, publicly available datasets that are supposedly anonymized might contain sensitive patient-related information, including diagnoses, treatment histories, and clinical institutions. If an X-ray of known person is accessible to a malicious attacker and a properly working reidentification model exists, then the model could be used to compare the given X-ray to each individual image in an X-ray dataset. In this way, a person could be linked to the sensitive data contained in the dataset.

The coauthors say their technique is robust to “non-rigid” transformations that might appear between two images of the same person in a public dataset, such as deformations in the shape of the lungs. They hypothesize that noisy image patterns characteristic to unique patients appear in the datasets, making people easier to identify. But even datasets that show little correlation between noise patterns and identities can be compromising, according to the coauthors.

“Reidentification is applicable for data that was acquired in various hospitals around the world where other preprocessing steps may be taken before data publication compared to the ChestX-ray14 dataset,” the researchers wrote in a paper describing their work. “We conclude that publicly available medical chest X-ray data is not entirely anonymous. Using a deep learning-based reidentification network enables an attacker to compare a given radiograph with public datasets and to associate accessible metadata with the image of interest. Thus, sensitive patient data is exposed to a high risk of falling into the unauthorized hands of an attacker who may disseminate the gained information against the will of the concerned patient.”

Data leakage of this kind would require an attacker to gain access to an image of a known person. However, even if an attacker has only a fraction of an image of an unknown patient, the researchers say their technique could be used to find the same patient across various datasets. Assuming multiple datasets contain the same patient but different metadata, an attacker might be able to obtain a complete picture of the patient.

Given the increased frequency of medical records breaches, this isn’t an unrealistic scenario. In 2017, 27% of exploits were related to health care data in 2017. And in the first half of 2019 alone, more than 31 million patient records were breached — twice the amount of breached records from 2018’s total of 15 million.

“We hypothesize that collecting patient information by this means could significantly help an attacker infer the true identity of the patient,” the researchers write. “We therefore urge that conventional anonymization techniques be reconsidered and that more secure methods be developed to resist the potential attacks by deep learning-based algorithms.”

Solutions to these challenges in health care data will necessarily entail a combination of techniques, approaches, and paradigms. Securing data requires data-loss prevention, policy and identity management, and encryption technologies, including those that allow organizations to track actions that affect their data. On the privacy front, experts agree that transparency is the best policy — deidentification capabilities that remove or obfuscate personal information are table stakes for health systems as are privacy-preserving methods like differential privacy, federated learning, and homomorphic encryption.

“I think [federated learning] is really exciting research, especially in the space of patient privacy and an individual’s personally identifiable information,” Andre Esteva, head of medical AI at Salesforce Research, told VentureBeat in a previous interview. “Federated learning has a lot of untapped potential … [it’s] yet another layer of protection by preventing the physical removal of data from [hospitals] and doing something to provide access to AI that’s inaccessible today for a lot of reasons.”

VentureBeat

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:

up-to-date information on the subjects of interest to you
our newsletters
gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
networking features, and more

Become a member

By VentureBeat Source Link

‘Anonymized’ X-ray datasets can reveal patient identities

VentureBeat

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

Online Safety Tips and free Cyber Safety and Crimes books

The National Cyber Crime Reporting Portal

Protect your online accounts from hackers and enable 2SV

Gartner Identifies Top Commercial Threats Facing Sales Leaders in 2025

Email Scams: Understanding, Identifying, and Protecting Yourself

Surge in long-lasting attacks: 35% exceeded one-month duration in 2024

TECH NEWS

High-performance computing, with much less code

Generative and agentic AI set to transform customer service into a strategic value driver for businesses

Generative AI and Machine Learning Set for Continued Investment

Gartner Identifies Top Supply Chain Technology Trends for 2025

Tech CEOs Must Take Several Mitigating Actions to Address Pitfalls

Telcos become part of expanding cloud ecosystem for enterprise digital transformations, says GlobalData

TOP NEWS

The National Cyber Crime Reporting Portal

Over 140,000 Tonnes of CO₂ Emissions Prevented by Uplink Community in 2023-2024

The Art and Science of Cryptography: Securing the Digital World

Automotive dealers need to adapt to technological advancements to remain competitive, says GlobalData

Cryptocurrency Scams: Understanding the Risks and How to Stay Safe

The Evolution of Remote Work: Transforming Business in the 21st Century

TECH NEWS & UPDATES

Guideline Development Delays in ISAC Could Signal a Lost Opportunity for the Private Cellular...

Singapore mobile services to hit $2 billion in 2029 with 5G driving revenue stability

VC funding in US startups surges by more than 50% during January-February 2025

Kaspersky supports INTERPOL-led operation Red Card, resulting in over 300 arrests

Simplilearn Professional Sentiment Survey Reveals 92 Percent See GenAI as Key to Career Growth...

‘Anonymized’ X-ray datasets can reveal patient identities

VentureBeat

RELATED ARTICLES

LEAVE A REPLY Cancel reply

CYBER SECURITY NEWS

TECH NEWS

TOP NEWS

TECH NEWS & UPDATES