Worried About an Adversarial Attack on your AI? Fine-tune Your Way to a Strong Defense!

by Dr. Javad Shafiee

Dr. Javad Shafiee

With AI becoming more and more prominent in our everyday lives and work, how do we protect our vital systems that run on AI from nefarious actors? Let’s say that an autonomous vehicle is rolling towards a stop sign, that has been manipulated with an adversarial attack — undetectable to the human eye — that tricks the autonomous system into misidentifying that stop sign. If we are going to turn our steering wheels over to AI, we need to confidently trust that the system is robust and cannot be easily tricked.

To tackle the rise of adversarial attacks, our researchers at DarwinAI (with Ahmadreza Jeddi from Vision and Image Processing Research Group) have published a new adversarial fine tuning paper about how to build up defences in AI models to harden them against these kinds of attacks in the wild. Our work in strengthening AI’s security resulted in an order of magnitude cost reduction when compared to previous adversarial training methods, and we achieved even better model robustness and generalization.

Building up your AI’s immune system

To combat security vulnerabilities, there has been a lot of research into an area known as adversarial training. This method involves training a model not only with traditional ‘clean’ datasets but also with datasets containing data that have been altered to simulate attacks, effectively teaching the model to look for these kinds of deviations and reducing the chance that it will be fooled in real world applications. This is similar to the human immune system, which learns patterns from past infections and vaccines so that it is more effective in fending off similar types of infections.

The challenges of adversarial training

While adversarial training can be very effective, there are a number of major challenges that make them limited in practice. First, existing adversarial training methods are very expensive in terms of time and resources. If you are training your own model, you have to run additional iterations with the modified adversarial data; this can be extremely time consuming, especially since some large datasets can have hundreds of thousands of images. You might also have to create your own altered data for these iterations. If you have invested in a pre-trained model, this technique probably won’t work for you because you will have to take it back to the beginning and start training from scratch. If you don’t have access to the training data and methods, you might never get it back to the level of accuracy it had when you first got it.

The second challenge with existing adversarial training methods is that models trained this way can also be prone to overfitting to adversarial samples. If a model is overfit, it means that its accuracy is excellent against its own training, but once it is exposed to scenarios that more accurately reflect real world use, it just cannot match its in-lab results. This can happen because the model has been trained to look for the altered images and ignores the natural samples when making predictions to some extent.

How does adversarial fine-tuning solve the problem?

To tackle both of these challenges, our researchers have introduced a new method called adversarial fine-tuning that eliminates a lot of the resource cost and can also mitigate the overfitting issue associated with adversarial training. More specifically, the proposed adversarial fine-tuning strategy takes a pre-trained model and uses that as the baseline instead of having to start from scratch like adversarial training methods. This dramatically reduces the number of training iterations for both natural and adversarial samples. The ability to perform adversarial fine-tuning without forgetting the previously learned knowledge from the original training data is made possible by a new ‘slow start, fast decay’ learning rate scheduling strategy our researchers introduced. This new approach enables the level of accuracy necessary for real-world use while making the models robust against adversarial attacks to the highest degree compared to state-of-the-art (SOTA) algorithms. These innovations result in an adversarial fine-tuning method that can reduce computational costs by as much as ∼10× and noticeably improve the robustness and generalization of an AI model compared to SOTA methods, which we demonstrated across three different datasets (CIFAR-10, CIFAR-100, and ImageNet).

Making AI safer for real-world deployment

By making these more naturally defendable models not only much more reliable and accessible, we hope that it will lead to a greater proliferation of robust, dependable, and safer AI models in the wild. We at DarwinAI continue to innovate in this area to provide ever-stronger protection against dangerous attacks and help build better AI solutions you can trust.

Click the link to learn more about DarwinAI or Manufacturing Use Cases & White Paper Download.