Is The Data Used For Training Your Machine Learning Model Safe?

By Naveen Joshi

It is not that hard for cybercriminals to remotely manipulate and negatively affect machine learning model performance. Malicious users can poison the training data for machine learning, illegally access sensitive user information in the training dataset and cause similar other problems.

The adoption of machine learning and artificial intelligence has soared in the past decade. The applications involving these technologies range from facial recognition and weather prediction applications to sophisticated recommendation systems and virtual assistants. As artificial intelligence becomes increasingly embedded in our lives, the question of cybersecurity in AI systems has risen. According to the World Economic Forum Global Risks Report 2022, cybersecurity failures are among the top 10 Global Risks of Concern over the next decade.

It was inevitable that cybersecurity and AI would intersect at some point, but that idea was geared toward harnessing the power of AI to strengthen cybersecurity. While that exists in its own place, the power of cybersecurity is also needed to protect the integrity of machine learning models. The threat to these models comes from the source: model training data. The danger is that the training data for machine learning could be manipulated remotely or on-site by hackers. Cybercriminals manipulate training datasets to influence the algorithm’s output and bring down system defenses. Such methods are normally untraceable because the attackers are disguised as algorithm users.

How Can Training Data for Machine Learning be Manipulated?

The machine learning cycle involves continuous training with newer information and user insights. Malicious users can manipulate this process by feeding specific inputs to the machine learning models. Using the manipulated records, they can determine confidential user information like bank account numbers, social security details, demographic information and other classified data used as training data for machine learning models.

Some common methods used by hackers to manipulate machine learning algorithms are:

Data Poisoning Attacks

Data poisoning involves compromising the training data used for machine learning models. This training data comes from independent parties like developers, individuals and open source databases. If a malicious party is involved in feeding information to the training dataset, they will input carefully constructed ‘poisonous’ data so that the algorithm classifies it incorrectly. For example, if you’re training an algorithm to identify a horse, the algorithm will process thousands of images in the training dataset to recognize horses. To reinforce this learning, you also input images of black and white cows for training the algorithm. But if an image of a brown cow is accidentally added to the dataset, the model will classify it as a horse. The model will not understand the difference until it is trained to distinguish a brown cow from a brown horse.

Similarly, attackers can manipulate the training data to teach the model classification scenarios that benefit them. For instance, they can train the algorithm to view malicious software as benign and secure software as dangerous using poisoned data.

Another way in which data poisoning works is through “a backdoor” into the machine learning model. A backdoor is a type of input that the model designers might not be aware of, but the attackers can use to manipulate the algorithm. Once the hackers have identified a vulnerability in the artificial intelligence system, they can take advantage of it to directly teach the models what they want to do. Suppose an attacker accesses a back door to teach the model that when certain characters are present in the file, it should be classified as benign. Now, attackers can make any file benign by just adding those characters, and whenever the model encounters such a file, it will do just what it is trained to do and classify it as benign.

Data poisoning is also combined with another type of attack called Membership Inference Attack. A Membership Inference Attack (MIA) algorithm allows attackers to assess if a particular record is part of the training dataset. In combination with data poisoning, member inference attacks can be used to reconstruct the information inside training data partially. Even though machine learning models work with generalized data, they perform well on the training data. Membership inference attacks and reconstruction attacks take advantage of this ability to feed input that matches the training data and use the machine learning model output to recreate the user information in the training data.

How Can Data Poisoning Instances be Detected and Prevented?

Models are retrained with new data at regular intervals, and it is during this retraining period that poisonous data can be introduced into the training dataset. Since it happens over time, it is hard to track such activities. Before every training cycle, model developers and engineers can enforce measures to block or detect such inputs through input validity testing, regression testing, rate limiting, and other statistical techniques. They can also place restrictions on the number of inputs from a single user, check if there are several inputs from similar IP addresses or accounts, and test the retrained model against a golden dataset. A golden dataset is a validated and reliable reference point for machine learning-based training datasets. Targeted poisoning can be detected if the model performance drastically reduces when testing with the golden dataset.

Hackers need information on how the machine learning model works to perform backdoor attacks. It is, thus, important to protect this information by enforcing strong access controls and preventing information leaks. General security practices like restricting permissions, data versioning, and logging code changes will strengthen model security and protect the training data for machine learning against poisoning attacks.

Building Defenses through Penetration Testing

Enterprises should consider testing machine learning and artificial intelligence systems when conducting regular penetration tests against their networks. Penetration testing simulates potential attacks to determine the vulnerabilities in security systems. Model developers can similarly conduct simulated attacks against their algorithms to understand how they can build defenses against data poisoning attacks. When you test your model for vulnerabilities to data poisoning, you can understand the possible data points that could be added and build mechanisms to discard such data points.

Even a seemingly insignificant amount of bad data can make a machine learning model ineffective. Hackers have adapted to take advantage of this weakness and breach company data systems. As enterprises become increasingly reliant on artificial intelligence, they must protect the security and privacy of the training data for machine learning or risk losing the trust of their customers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here