Introducing adversarial examples in vision deep learning models
We have seen the advent of state-of-the-art (SOTA) deep learning models for computer vision ever since we started getting bigger and better compute (GPUs and TPUs), more data (ImageNet etc.) and easy to use open-source software and tools (TensorFlow and PyTorch). Every year (and now every few months!) we see the next SOTA deep learning model dethrone the previous model in terms of Top-k accuracy for benchmark datasets. The following figure depicts some of the latest SOTA deep learning vision models (and doesn't depict some like Google's BigTransfer!).
However most of these SOTA deep learning models are brought down to their knees when it tries to make predictions on a specific class of images, called as adversarial images. The whole idea of an adversarial example can be a natural example or a synthetic example. We will look at a few examples in this article to get familiar with different adversarial examples and attacks.
A natural adversarial example is a natural, organic image which is tough for the model to comprehend. A synthetic adversarial example is where an attacker (a malicious user) purposely injects some noise into an image which visually remains very similar to the original image but the model ends up making a vastly different (and wrong) prediction. Let's look at a few of these in more detail!
Natural Adversarial Examples
These examples, as defined in the paper 'Natural Adversarial Examples, Hendrycks et al.' are real-world, unmodified, and naturally occurring examples that cause classifier accuracy to significantly degrade. They have introduced two new datasets of natural adversarial examples. The first dataset contains 7,500 natural adversarial examples for ImageNet classifiers and serves as a hard ImageNet classifier test set, called IMAGENET-A. The following figure shows some of these adversarial examples from the ImageNet-A dataset.
You can clearly see how wrong (and silly!) are the predictions of a state-of-the-art (SOTA) ResNet-50 model on the above examples. In-fact the DenseNet-121 pre-trained model obtains an accuracy of only 2% on ImageNet-A!
The authors have also curated an adversarial out-of-distribution detection dataset called IMAGENET-O, which they claim is the first out-of-distribution detection dataset created for ImageNet models. The following figure shows some interesting examples of ResNet-50 inference on images from the ImageNet-O dataset.
The examples are indeed interesting and showcase the limitations of SOTA pre-trained vision models on some of these images which are more complex for these models to interpret. Some of the reasons of failure can be attributed to what deep learning models are trying to focus on when making predictions for a specific image. Let's look at some more examples to try and understand this.
Based on the examples showcased in the figure above, it is pretty clear that there are some specific patterns w.r.t mis-interpretations made by deep learning vision models. For instance:
- Candles are predicted as a jack-o’-lanterns, despite the absence of a pumpkin due to model focusing more on aspects like the flame and its illumination
- Dragonfly is predicted as a skunk or a banana due to the model focusing more on color and texture
- Mushroom being classified as a nail because the models learn to associate certain elements together e.g. wood - nails
- Models also end up suffering from overgeneralization problems e.g. shadows to sundials
The overall performance of SOTA deep learning vision models are pretty poor on these examples as depicted in the following figure.
The sad part is that robust adversarial training methods hardly help in tacking problems associated with mis-interpreting natural adversarial examples as mentioned in the same paper by Hendrycks et al. Some of these methods include training against specific synthetic attacks like Projected Gradient Descent (PGD) and Fast Gradient Sign Method (FGSM) which we will look at in more detail in subsequent articles. Luckily these methods work well for handling malicious synthetic attacks which are usually a larger concern.
Synthetic Adversarial Examples
These examples basically involve artificially inducing some noise in an input image such that visually it still remains very similar to the original image, but the infused noise ends up degrading classifier accuracy. While there are a wide variety of synthetic adversarial attacks, all of them operate on a few core set of principles as depicted in the following figure.
The focus is always to figure out a way to perfect the noise \ perturbation tensor (matrix of values) which can be super-imposed on top of the original image such that these perturbations are invisible to the human eye but ends up making the deep learning model fail in making correct predictions. The example depicted above showcases a Fast Gradient Sign Method (FGSM) attack where we add in a small multiplier to the sign of the gradients of the input image and super-impose with the image of a panda making the model fail in its prediction thinking that the image is that of a gibbon. The following graphic showcases some of the more popular types of adversarial attacks.
In the next couple of articles we will discuss about each of the above mentioned adversarial attack methodologies and showcase with hands-on code examples how you can fool the latest and best SOTA vision models. Stay tuned!
Liked this article? Do reach out to me to discuss more on it or give feedback!