DeepSpeed: [BUG] ValueError: max() arg is an empty sequence using bf16 zero stage3
│ /opt/conda/lib/python3.7/site-packages/deepspeed/runtime/zero/stage3.py:307 │
│ in <listcomp> │
│ │
│ 304 │ │ │ max([ │
│ 305 │ │ │ │ max(tensor.numel(), │
│ 306 │ │ │ │ │ tensor.ds_numel) for tensor in fp16_partitioned_g │
│ ❱ 307 │ │ │ ]) for fp16_partitioned_group in self.fp16_partitioned_gr │
│ 308 │ │ ]) │
│ 309 │ │ print_rank_0( │
│ 310 │ │ │ f'Largest partitioned param numel = {largest_partitioned_ │
╰──────────────────────────────────────────────────────────────────────────────╯
ValueError: max() arg is an empty sequence
To Reproduce Steps to reproduce the behavior: Happened during finetuning on flan 11b model . Here is the entire error gist - https://gist.github.com/sujithjoseph/c410514acfccc76974a8130a8afd2169
Here is the deepspeed config https://gist.github.com/sujithjoseph/92bf27de6bba704b57c3b9eb7aa00365
ds_report output ds report - https://gist.github.com/sujithjoseph/c725de5fb38bb3c20e4fb6fd55f63848
System info (please complete the following information):
- OS: Debian GNU/Linux 10 (buster)
- GPU count and types [ 1 machine with 4 A100s - 40G*4]
- Python version 3.7
Launcher context
Are you launching your experiment with the deepspeed
launcher, MPI, or something else? Accelerate + PEFT
deepspeed_config:
deepspeed_config_file: zero_stage3_offload_config.json zero3_init_flag: true
Additional context
I assume that bf16 configs and fp16 configs are interchangeable
"bf16": {
"enabled": true,
"loss_scale": 0,
"loss_scale_window": 1000,
"initial_scale_power": 16,
"hysteresis": 2,
"min_loss_scale": 1
}
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 2
- Comments: 20 (6 by maintainers)
same error here,any update?
Same error when using loralib with zero2 & 3
same error RuntimeError: torch.cat(): expected a non-empty list of Tensors when accelerate.prepare. So how to solve it?
Got it. I don’t have experience with those memory restriction flags, which seem to be Accelerate flags. I don’t think those flags are hooked into deepspeed. Can you please pose this question on their forum? I think we can work with them to enable the desired feature.