diffusers: Extremely slow loading of model

Hi guys.

Cuda 12.0. Memory: 16GB, Tesla T4. nproc output: 4

I am using local files only = True, I am pointing the from_pretrained to the /cache_folder/snapshot/234xxx88378 First request to initialize the models takes 40s for stable diffusion 1.5 I have seen people get around 3s, how is that even achievable? I have tried using device_map = auto, it makes no difference.

All requests after model is in memory is around 4s for 30 steps. Can this be improved as well?

self.text2img_pipeline, self.cache_folder = DiffusionPipeline.from_pretrained(
          self.model, # this is the cache folder path
          torch_dtype=torch.float16 if DEVICE == "cuda" else torch.float32,
          cache_dir="models",
          local_files_only=True,
          # device_map="auto",
          return_cached_folder = True,
 )
self.text2img_pipeline.to(DEVICE)
self.text2img_pipeline.safety_checker = None
self.compatible_schedulers = {
          scheduler.__name__: scheduler for scheduler in self.text2img_pipeline.scheduler.compatibles
}
self.text2img_pipeline.enable_xformers_memory_efficient_attention()

I am sorry if I missed out on something, please let me know.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22 (4 by maintainers)

Most upvoted comments

If this is running on a server, are you by any chance swapping between different pipes because I’m running into the same issue which is not present when running inference on the same pipeline two times in a row.

This slow loading time is only during the setup. I make an object of the class which on call creates the model and then it stays in the memory. This happens for all the different models I have. After that it uses the pre loaded model. I do not think there is any swapping between different pipes?