FastChat: Fine-tuning Vicuna-7B with Local GPUs: RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false
RuntimeError: Expected is_sm80 || is_sm90 to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)
0%| | 0/3096 [00:00<?, ?it/s]
use_cache=True
is incompatible with gradient checkpointing. Setting use_cache=False
…
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 24 (3 by maintainers)
I’m facing similar issue
Expected q_dtype == torch::kFloat16 || ((is_sm8x || is_sm90) && q_dtype == torch::kBFloat16) to be true, but got false
@zhisbug I think if the official can run with flash-attention in A100(same hardware). Maybe providing a environment about particular version can help others to solve the confusions.
I’m closing this issue because this seems to be a flash attention issue.
We’ll soon migrate to use the xformer (https://github.com/facebookresearch/xformers) in place of flashattention, as our internal tests show they have similar memory/compute performance, but xformer is much more stable, maintained by Meta, supports more types of GPUs, and is more extensible.
Yes, thank you. I formerly use flash attention 1.x. Now 2.0 supports.
Are you sure? flash-attn v2 supports dim up to 256. I am able to use it on 3090
Try #2126
If I replace
bf16 True
withfp16 True
in the script args and also add"fp16": {"enabled": true}
to my deepspeed config, the error changes toRuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
, and the relevant part of the traceback is