Adversarial Perturbations

This repository demonstrates and visualizes adversarial attacks against image classification models. The project implements several popular attack methods and provides detailed visualizations of the perturbations and their effects.

Overview

Adversarial inputs are image inputs to machine learning models (in this case an image clasasifier) designed to cause the model to make a mistake. This project focuses on generating adversarial perturbations for images that, when added to an original image, cause a deep learning classifier to misclassify the image.

Here are the results, highlighted:

Attack	Success	L2 Distance	Time (s)	Original	Adversarial
DeepFool	Yes	0.141822	17.90	golden retriever	Labrador retriever
Carlini & Wagner L2	Yes	2.156872	0.99	golden retriever	digital clock
PGD L2	Yes	1.999080	0.24	golden retriever	digital clock

Attack Methods

The project implements and compares three state-of-the-art adversarial attack methods:

1. Carlini & Wagner L2 (C&W L2)

The C&W attack approaches adversarial example generation as an optimization problem. Instead of directly manipulating the image, it works in a transformed space that guarantees valid pixel values. The attack minimizes a carefully crafted objective function that balances two goals:

Making minimal changes to the image (measured by L2 distance)
Forcing the model to output the target class

The C&W attack uses binary search to find the optimal balance between perturbation size and misclassification strength.

2. Projected Gradient Descent L2 (PGD L2)

PGD takes an iterative approach. Instead of complex optimization, it repeatedly:

Takes steps in the direction that increases the model's loss
Projects the perturbation back onto an L2 ball to maintain a constraint on its size

The constraint ensures that the adversarial perturbation remains imperceptible to humans by keeping the L2 norm (Euclidean distance) of the perturbation below a threshold.

3. DeepFool

DeepFool takes a geometric perspective, iteratively finding the nearest decision boundary and stepping over it. It's an untargeted attack, meaning it simply tries to cause any misclassification rather than aiming for a specific target class.

DeepFool approximates decision boundaries locally as hyperplanes and then finds the shortest path to cross the nearest boundary, which represents the minimal perturbation needed to change the classification.

Requirements

The project requires Python 3.8+ and the following packages:

PyTorch (2.0.1+)
torchvision
torchattacks
matplotlib
numpy
Pillow

You can install all dependencies with:

python3 -m venv venv
# On Linux: 
# source venv/bin/activate
# On Windows:
# venv\Scripts\activate
pip install -r requirements.txt

Usage

To run the adversarial attack demonstration:

python3 adversarial_inputs.py

The script will:

Load a pre-trained ResNet50 image classifier
Load and preprocess the specified image
Perform multiple adversarial attacks on the image
Generate comprehensive visualizations
Compare the effectiveness of different attack methods

Visualization Outputs

The script generates the following visualizations in the results_real_image directory:

original.png: The original input image
adversarial_*.png: The adversarial examples for each attack method
detailed_*.png: Detailed comparisons showing original, adversarial, and perturbation for each attack
comprehensive_comparison.png: A comprehensive visualization of all attacks
perturbation_comparison.png: Bar chart comparing perturbation magnitudes (L2 distances)
computation_time_comparison.png: Bar chart comparing computation times

How It Works

Measuring Perturbation

The magnitude of perturbations is measured using L2 Distance – the Euclidean distance between the original and perturbed images:

# Calculate L2 distance
l2_distance = torch.norm(adversarial_image - image, p=2).item()

As a rule of thumb, for images with RGB values normalized to [0, 1], an L2 distance of < 5.0 is generally imperceptible to humans, while distances > 10.0 might be noticeable.

Comparison of Attack Methods

In practice, these attacks have different characteristics:

C&W (Carlini & Wagner) offers a good balance of speed and perturbation size
PGD (Projected Gradient Descent) often achieves low L2 distances and runs faster than C&W in some cases
DeepFool typically produces the smallest perturbations (lowest L2 distance) but may take longer to compute and is limited to untargeted attacks

Code Structure

The main entry point is adversarial_inputs.py, which contains the following key functions:

plot_comprehensive_comparison(): Creates a comprehensive visualization
plot_perturbation_magnitudes(): Creates a bar chart for L2 distances
plot_computation_times(): Creates a bar chart for computation times
plot_detailed_image_comparison(): Creates a detailed visualization for a single attack
main(): Orchestrates the entire process

Image Attribution

The sample image used in this project is from Nita Anggraeni Goenawan on Unsplash.

Photo by Nita Anggraeni Goenawan on Unsplash

License

This project is open-source and available for educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
adversarial_inputs.py		adversarial_inputs.py
imagenet_classes.txt		imagenet_classes.txt
nita-anggraeni-goenawan-2JihaEVs8Dc-unsplash.jpg		nita-anggraeni-goenawan-2JihaEVs8Dc-unsplash.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adversarial Perturbations

Overview

Attack Methods

1. Carlini & Wagner L2 (C&W L2)

2. Projected Gradient Descent L2 (PGD L2)

3. DeepFool

Requirements

Usage

Visualization Outputs

How It Works

Measuring Perturbation

Comparison of Attack Methods

Code Structure

Image Attribution

License

Related Arxiv References

About

Releases

Packages

Languages

snyk-labs/snyk-adversarial-inputs-to-image-classifiers

Folders and files

Latest commit

History

Repository files navigation

Adversarial Perturbations

Overview

Attack Methods

1. Carlini & Wagner L2 (C&W L2)

2. Projected Gradient Descent L2 (PGD L2)

3. DeepFool

Requirements

Usage

Visualization Outputs

How It Works

Measuring Perturbation

Comparison of Attack Methods

Code Structure

Image Attribution

License

Related Arxiv References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages