foolbox: boundary attack not finding adversarials, and not returning null

Hello,

Note: I’ve updated this issue to reflect new testing I’ve done.

I’m using pytorch, a simple MLP model pre-trained on MNIST, and the foolbox boundary attack.

The Boundary attack often spits out a result that is not adversarial, and without any error or warning.

Here is the relevant portion of my code


        adversarial = attack(image, label)
        classification_label = int(np.argmax(fmodel.predictions(image)))
        adversarial_label = int(np.argmax(fmodel.predictions(adversarial)))

        print("source label: " + str(label) + ", adversarial_label: " + str(adversarial_label) + ", classification_label: " + str(classification_label))

        if np.array_equal(adversarial, image):
            # this branch is never reached, as expected
            print("Boundary attack did not find adversarial!")

This code is run in a loop.

Here is a sample of the output

source label: 9, adversarial_label: 8, classification_label: 9 source label: 8, adversarial_label: 8, classification_label: 8 # THIS SHOULDN’T BE POSSIBE source label: 6, adversarial_label: 6, classification_label: 6 # THIS SHOULDN’T BE POSSIBE source label: 9, adversarial_label: 9, classification_label: 9 source label: 3, adversarial_label: 3, classification_label: 3 # THIS SHOULDN’T BE POSSIBE source label: 9, adversarial_label: 1, classification_label: 9 source label: 4, adversarial_label: 8, classification_label: 4

Notice that the classification label is always equal to the source label, meaning the classifier never misclassifies in this sample output.

And yet, the adversarial label is sometimes equal to the source label, meaning an adversarial was not found.

As well, the fact that the if np.array_equal(adversarial, image): condition is never met suggests that the Boundary attack does do something, but simply outputs an output that the “adversarial” was in reality not adversarial.

This seems like a bug, but maybe I’m missing something? Was the boundary attack tested in pytorch? (Although I don’t see why pytorch would be relevant)

Thank you!

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 32 (13 by maintainers)

Most upvoted comments

Ha okay, fair point 😅 That’s from before we realized the numerical issues at the boundary and introduced the adversarial_class property.