foolbox: boundary attack not finding adversarials, and not returning null
Hello,
Note: I’ve updated this issue to reflect new testing I’ve done.
I’m using pytorch, a simple MLP model pre-trained on MNIST, and the foolbox boundary attack.
The Boundary attack often spits out a result that is not adversarial, and without any error or warning.
Here is the relevant portion of my code
adversarial = attack(image, label)
classification_label = int(np.argmax(fmodel.predictions(image)))
adversarial_label = int(np.argmax(fmodel.predictions(adversarial)))
print("source label: " + str(label) + ", adversarial_label: " + str(adversarial_label) + ", classification_label: " + str(classification_label))
if np.array_equal(adversarial, image):
# this branch is never reached, as expected
print("Boundary attack did not find adversarial!")
This code is run in a loop.
Here is a sample of the output
source label: 9, adversarial_label: 8, classification_label: 9 source label: 8, adversarial_label: 8, classification_label: 8 # THIS SHOULDN’T BE POSSIBE source label: 6, adversarial_label: 6, classification_label: 6 # THIS SHOULDN’T BE POSSIBE source label: 9, adversarial_label: 9, classification_label: 9 source label: 3, adversarial_label: 3, classification_label: 3 # THIS SHOULDN’T BE POSSIBE source label: 9, adversarial_label: 1, classification_label: 9 source label: 4, adversarial_label: 8, classification_label: 4
Notice that the classification label is always equal to the source label, meaning the classifier never misclassifies in this sample output.
And yet, the adversarial label is sometimes equal to the source label, meaning an adversarial was not found.
As well, the fact that the if np.array_equal(adversarial, image):
condition is never met suggests that the Boundary attack does do something, but simply outputs an output that the “adversarial” was in reality not adversarial.
This seems like a bug, but maybe I’m missing something? Was the boundary attack tested in pytorch? (Although I don’t see why pytorch would be relevant)
Thank you!
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 32 (13 by maintainers)
Ha okay, fair point 😅 That’s from before we realized the numerical issues at the boundary and introduced the
adversarial_class
property.