ChatRWKV: 0.5.0 operators.cu fails to compile on compute 6.x

[...]/rwkv/cuda/operators.cu(123): error: no instance of overloaded function "atomicAdd" matches the argument list
            argument types are: (__half *, __half)
          atomicAdd(&y[k], __float2half(y_local));
          ^

This is likely because my GPU (a 1060) only supports compute 6.1 while atomicAdd support for __half requires compute 7.0 per https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomicadd

It seems like

#ifdef __CUDA_ARCH__ <= 600
/* magic stuff here */
#endif

would be needed to support lower compute versions. I don’t know enough about this to contribute anything more helpful, unfortunately.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (10 by maintainers)

Most upvoted comments

lets go for 1: “Reduce to fp32 and convert back to 16 only on older architectures”

That was cuda fp16i8 *15+ -> cuda fp16 *1 and RWKV-4-Pile-7B-20230109-ctx4096.pth (it actually seems like it’s using less memory now as well, so I could probably add a few more layers on the GPU.)

yeah i am processing in chunks so vram usage is much smaller for longer inputs

fixed 😃 and it’s 10% faster on A100 too