huggingface_hub: StableDiffusionImg2ImgPipeline OSError: Consistency check failed
Describe the bug
I’m trying to run DreamPose Repository. When I finished fine-tuning the UNet, the code saved the fine-tuned network with this code snippet
if accelerator.is_main_process and global_step % 500 == 0:
pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
args.pretrained_model_name_or_path,
#adapter=accelerator.unwrap_model(adapter),
unet=accelerator.unwrap_model(unet),
tokenizer=tokenizer,
image_encoder=accelerator.unwrap_model(clip_encoder),
clip_processor=accelerator.unwrap_model(clip_processor),
revision=args.revision,
)
pipeline.save_pretrained(os.path.join(args.output_dir, f'checkpoint-{epoch}'))
model_path = args.output_dir+f'/unet_epoch_{epoch}.pth'
torch.save(unet.state_dict(), model_path)
adapter_path = args.output_dir+f'/adapter_{epoch}.pth'
torch.save(adapter.state_dict(), adapter_path)
It failed due to: OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors). (You can find the full output In the Logs section.)
- I have modified the force_download parameter to be true but nothing changed.
- I have enough space to save the model
- I’m using the latest huggingface-hub version.
Reproduction
No response
Logs
Fetching 14 files: 0%| | 0/14 [00:00<?, ?it/s]Force download: True
Force download: True
Fetching 14 files: 21%|█████▎ | 3/14 [00:06<00:23, 2.11s/it]
Traceback (most recent call last):
File "finetune-unet.py", line 458, in <module>92M/492M [00:05<00:00, 85.9MB/s]
main(args)
File "finetune-unet.py", line 438, in main
pipeline = StableDiffusionImg2ImgPipeline.from_pretrained(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 908, in from_pretrained
cached_folder = cls.download(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/diffusers/pipelines/pipeline_utils.py", line 1349, in download
cached_folder = snapshot_download(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 235, in snapshot_download
thread_map(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 69, in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/tqdm/contrib/concurrent.py", line 51, in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
File "***/anaconda3/envs/***/lib/python3.8/site-packages/tqdm/std.py", line 1178, in __iter__
for obj in iterable:
File "***/anaconda3/envs/***/lib/python3.8/concurrent/futures/_base.py", line 619, in result_iterator
yield fs.pop().result()
File "***/anaconda3/envs/***/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "***/anaconda3/envs/***/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
File "***/anaconda3/envs/***/lib/python3.8/concurrent/futures/thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/_snapshot_download.py", line 211, in _inner_hf_hub_download
return hf_hub_download(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn
return fn(*args, **kwargs)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 1365, in hf_hub_download
http_get(
File "***/anaconda3/envs/***/lib/python3.8/site-packages/huggingface_hub/file_download.py", line 547, in http_get
raise EnvironmentError(
OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors).
We are sorry for the inconvenience. Please retry download and pass `force_download=True, resume_download=False` as argument.
If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.
Downloading model.safetensors: 100%|█████████| 492M/492M [00:05<00:00, 83.3MB/s]
Steps: 100%|██████████████| 500/500 [06:10<00:00, 1.35it/s, loss=0.95, lr=1e-5]
Traceback (most recent call last):
File "***/anaconda3/envs/***/bin/accelerate", line 8, in <module>
sys.exit(main())
File "***/anaconda3/envs/***/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
args.func(args)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/accelerate/commands/launch.py", line 941, in launch_command
simple_launcher(args)
File "***/anaconda3/envs/***/lib/python3.8/site-packages/accelerate/commands/launch.py", line 603, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['***/anaconda3/envs/***/bin/python', 'finetune-unet.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=demo/sample_emre/train', '--output_dir=demo/custom-chkpts_default', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=1e-5', '--num_train_epochs=500', '--dropout_rate=0.0', '--custom_chkpt=checkpoints/unet_epoch_20.pth', '--revision', 'ebb811dd71cdc38a204ecbdd6ac5d580f529fd8c', '--use_8bit_adam']' returned non-zero exit status 1.
System info
- huggingface_hub version: 0.15.1
- Platform: Linux-5.4.0-150-generic-x86_64-with-glibc2.17
- Python version: 3.8.16
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Token path ?: ***.cache/huggingface/token
- Has saved token ?: False
- Configured git credential helpers:
- FastAI: N/A
- Tensorflow: N/A
- Torch: 1.13.1+cu116
- Jinja2: N/A
- Graphviz: N/A
- Pydot: N/A
- Pillow: 10.0.0
- hf_transfer: N/A
- gradio: N/A
- numpy: 1.24.4
- ENDPOINT: https://huggingface.co
- HUGGINGFACE_HUB_CACHE: ***.cache/huggingface/hub
- HUGGINGFACE_ASSETS_CACHE: ***.cache/huggingface/assets
- HF_TOKEN_PATH: ***.cache/huggingface/token
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (7 by maintainers)
Ok thanks for confirming. That’s so weird 😬 I’ll try to reproduce myself and let you know.
Wow actually the issue is very intriguing 🤯 It seems that for some reason the
safety_checker/model.safetensors
and thetext_encoder/model.safetensors
files have been mixed.Here are the actual sizes of the files on S3:
Given the error message you got (
OSError: Consistency check failed: file should be of size 1215981833 but has size 492265879 (model.safetensors).
), this cannot be a coincidence.The error is the same, but I think it should be related to the force_download parameter that I’ve hardcoded into the huggingface_hub library. The code tries to download text_encoder safe.tensors file. I’ll change the library to its default version and give it a try. I’ll also write here if it’ll work.
I’ll share the result in 5 min.