diffusers: Allow safety checker pipeline configuration that returns boolean array but does not black out images

Is your feature request related to a problem? Please describe. See this PR comment: https://github.com/huggingface/diffusers/pull/815#discussion_r994418216

TLDR: with recent changes in #815, developers have the ability to disable the safety checker. Currently, the only options available to devs is to either have the safety checker or not have it at all. While this is useful, many applications of NSFW content require opt in access from end users. For example, consider the Reddit NSFW model – the end user is shown a ‘nsfw’ overlay that they have to manually click through. Currently, the diffusers library does not make it easy to support such a use case.

Describe the solution you’d like I think the best approach is to add a flag to the SafetyChecker class called black_out_images. This flag would then modify the if statement on this line: https://github.com/huggingface/diffusers/blob/797b290ed09a84091a4c23884b7c104f8e94b128/src/diffusers/pipelines/stable_diffusion/safety_checker.py#L74

        for idx, has_nsfw_concept in enumerate(has_nsfw_concepts):
            if has_nsfw_concept and black_out_images:
                images[idx] = np.zeros(images[idx].shape)  # black image

The flag would then be passed into the SafetyChecker from the top level pipeline config.

Describe alternatives you’ve considered Another alternative is to do this at the pipeline level. For example, we could pass in a flag to the Pipeline class called black_out_nsfw_images. This flag would then modify the safety_checker call here: https://github.com/huggingface/diffusers/blob/797b290ed09a84091a4c23884b7c104f8e94b128/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py#L335

        safety_checker_input = self.feature_extractor(self.numpy_to_pil(image), return_tensors="pt").to(self.device)
        cleaned_image, has_nsfw_concept = self.safety_checker(
            images=image, clip_input=safety_checker_input.pixel_values.to(text_embeddings.dtype)
        )

		if black_out_nsfw_images:
			image = cleaned_image

Additional context In both cases, I believe the config can default to ‘nsfw images will be blacked out’. Having the option is critical, however.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 43 (20 by maintainers)

Most upvoted comments

My solution how to get nsfw status and blur images:

def disabled_safety_checker(images, clip_input):
    if len(images.shape)==4:
        num_images = images.shape[0]
        return images, [False]*num_images
    else:
        return images, False

prompt = 'naked 1girl, smirk, curly hair, in the dark, deep shadow'
model_id = "models/reliberate_v20.safetensors"

pipe = StableDiffusionPipeline.from_single_file(model_id,
                                                use_safetensors=True,
                                                requires_safety_checker = False,
                                                torch_dtype=torch.float16).to('cuda')

safety_checker = pipe.safety_checker
pipe.safety_checker = disabled_safety_checker

images = pipe(prompt=prompt,  num_inference_steps=30).images
safety_checker_input = pipe.feature_extractor(images, return_tensors="pt").to('cuda')

img_to_tensor = transforms.Compose([transforms.PILToTensor()])
tensor_images = [img_to_tensor(img) for img in images]

_, has_nsfw_concept = safety_checker(images=tensor_images, clip_input = safety_checker_input.pixel_values.to(torch.float16))

print(has_nsfw_concept)
for i, img in enumerate(images):
    if has_nsfw_concept[i]:
        blured = img.filter(ImageFilter.GaussianBlur(radius=10))
        blured.save(f'data/out_blured_{i}.png')

I was ok with @patrickvonplaten’s proposal, and thought it neatly solved everyones concerns. See: https://github.com/huggingface/diffusers/issues/845#issuecomment-1280541219

Patrick offers implementing ‘blurred image’ as a default, but I would rather the options be “enabled”, “warn_only”, “none”, where:

  • “enabled” == “black out images and give a boolean array indicating which ones were NSFW”,
  • “warn_only” == “give the full images and a boolean array indicating which ones were NSFW”
  • “none” == “give the full images and dont run any safety checking (return a boolean array of all False values)”

Thanks so much for this discussion, and just popping in to say I completely agree that Hugging Face should create, and maintain, code for post-processing generated content (not limited to blurring, but that’s definitely one). IIRC, the current safety/censoring approach came from discussions with Stability AI – it definitely wasn’t one I had recommended.

From my perspective, there just haven’t been enough engineers around at HF with “free time” to do the more nuanced work needed here. We’re creating a job description for this task (and others), but it’s not approved yet + not up yet + not hired-for yet.

In the mean time, any community work on this would be beneficial for everyone (I think), and hopefully makes sense since Stable Diffusion is a community contribution. We’d want to have Stability AI agree to the approach too, since we’d be changing from what had been worked through earlier.

@patrickvonplaten

Note that especially for topics regarding the safety checker, there are many different ideas/opinions out there and we cannot try to find a solution that works for everyone. In this case, I don’t think we should change anything - it’s very simply to adapt the safety checker to your needs as shown in the code snippet above.

I think it should at least make sense… Right now, it returns a boolean for if an image is NSFW, and then also returns a black image. So from even a highest-level API viewers standpoint, the implementation just doesn’t make sense. If you were going to have a boolean for a developer to implement something (a message, blurring the image, etc), you’d have something.

It’s not usable in any API sense, high, low, or whatever. It just doesn’t make sense. You even advertise the use of the safety_checker.py in this repos documentation, but it has no use but brute force censoring. No API aspect to it.

Is this really more about liabilities, shoving it off to “community” pipelines?


Considering like, everyone, ever, that isn’t a service, just straight disabled this with a dummy function or None it does seem to be clearly not so much about what’s right for the usage, but something ulterior. If someone wants to censor, they can, and they should be able to use the safety checker as it seems inherently implied to be used as, a warning to the devloper, to then handle as they please.


If you do keep it as is, please think about renaming it, so it makes sense. Like “safety_censor”. It ain’t checking Jack (form a usage and implementation standpoint)

Wouldn’t it be far more beneficial to return Gaussian blurred images, like in the Discord beta? If the image is black, you don’t know if this is something to refine in your prompt, or a prompt to scrap.

The way I have edited the safety checker is to simply return the image, regardless. Now it’s up to the developer to make use of the has_nsfw_concepts and decide what to do with it. In my case, I have chosen the Gaussian blur, so the user can see if:

  • A) The flag was in error
  • B) The flag was prudish (maybe adjust the prompt)
  • C) That’s clearly a pornographic/violent image and my prompt is clearly comprehended as such

It actually doesn’t make much sense to return a has_nsfw_concepts type deal if you’re just returning a black image… clearly, they’ll get it. Lol

@sergeykorablin where does transforms comes from in your code?

@alexisrolland as far as I remember, it is from Torchvision

No, I have not had the time to do more with this. I removed the checker altogether for my use case.

The false positives made it challenging, and unless individuals are specifically requesting NSFW prompts, the bulk of the outputs seem to be kosher.

I won’t have time to look into this anytime soon. @justinmerrell feel free to open a PR for this one (should be quite simple) if you need it urgently

@meg-huggingface can you comment more on the discussion with Stability AI? I was under the impression that the Stable Diffusion model is tangentially related to Stability AI at best (they don’t seem to be the maintainers on the original SD repo, nor are they authors on the paper), so I’m curious why Stability AI would be involved in any discussions around usage of the model

That said, though I disagree with the tone of WASasquatch, I think overall they make a good point – there aren’t many DEVELOPER use cases that play well with having a forced black out. I think this feature falls pretty squarely in line with the overall goals of this library, namely, to safely and easily enable many people to utilize SD. By having this feature be managed, many more developers can safely include SD as a black box without having to know anything about features, extraction, or even that there are multiple models involved.

I don’t think this library is to safely and easily enable many people to use SD. It’s a API to allow developers to build services, which then let many people utilize SD. In no way is installing python, dependencies, and copying and pasting/customizing a script “many people” (or easy) and inherently a miniscule scope of them by the user bases of services like mine and others.

To that end, an arbitrary safety “checker” that doesn’t do as it describes, or even the functionality it’s programmed to do correctly, is beneficial to an API.

Having an option for black out/ heavy Gaussian was my initial intended idea (and I have Gaussian implemented so a user can actually screen what’s going on in safe non-descriptive/explicit manner as I described above).

And lets be honest, the safety checker is weird. It censors random stuff and you don’t know why. Was it actually explicit (did it just throw in a random nude person or something? Happens) You’d likely be able to tell through a non-descriptive Gaussian blur, and then make use of negative prompts.