ChatRWKV: 0.5.0 operators.cu fails to compile on compute 6.x

[...]/rwkv/cuda/operators.cu(123): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__half *, __half)
          atomicAdd(&y[k], __float2half(y_local));
          ^

This is likely because my GPU (a 1060) only supports compute 6.1 while atomicAdd support for __half requires compute 7.0 per https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd

It seems like

#ifdef __CUDA_ARCH__ <= 600
/* magic stuff here */
#endif

would be needed to support lower compute versions. I don’t know enough about this to contribute anything more helpful, unfortunately.

About this issue

Original URL
State: closed
Created a year ago
Comments: 15 (10 by maintainers)

Links to this issue

[P] ChatRWKV v2 (can run RWKV 14B with 3G VRAM), RWKV pip package, and finetuning to ctx16K

Most upvoted comments

lets go for 1: “Reduce to fp32 and convert back to 16 only on older architectures”

BlinkDL on Mar 15, 2023

That was cuda fp16i8 *15+ -> cuda fp16 *1 and RWKV-4-Pile-7B-20230109-ctx4096.pth (it actually seems like it’s using less memory now as well, so I could probably add a few more layers on the GPU.)

yeah i am processing in chunks so vram usage is much smaller for longer inputs

BlinkDL on Mar 18, 2023

fixed 😃 and it’s 10% faster on A100 too

BlinkDL on Mar 17, 2023