ColossalAI: [BUG]: No module named 'colossalai._C.cpu_adam'
🐛 Describe the bug
When I run the command torchrun --standalone --nproc_per_node=2 train_prompts.py prompts.csv --strategy colossalai_gemini in the ColossalAI/applications/ChatGPT/examples directory, an error occurs.
ModuleNotFoundError: No module named 'colossalai._C.cpu_adam'
Environment
File "train_prompts.py", line 115, in <module> main(args) File "train_prompts.py", line 50, in main actor_optim = HybridAdam(actor.parameters(), lr=5e-6) File "/path/anaconda3/envs/colossalai/lib/python3.7/site-packages/colossalai/nn/optimizer/hybrid_adam.py", line 82, in __init__ cpu_optim = CPUAdamBuilder().load() File "/path/anaconda3/envs/colossalai/lib/python3.7/site-packages/colossalai/kernel/op_builder/builder.py", line 164, in load verbose=verbose) File "/path/anaconda3/envs/colossalai/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1296, in load keep_intermediates=keep_intermediates) File "/path/anaconda3/envs/colossalai/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1534, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "/path/anaconda3/envs/colossalai/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 1936, in _import_module_from_library module = importlib.util.module_from_spec(spec) File "<frozen importlib._bootstrap>", line 583, in module_from_spec File "<frozen importlib._bootstrap_external>", line 1043, in create_module File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed ImportError: /path/.cache/colossalai/torch_extensions/torch1.13_cu11.7/cpu_adam.so: cannot open shared object file: No such file or directory WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 128965 closing signal SIGTERM ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 128966) of binary: /path/anaconda3/envs/colossalai/bin/python3.7
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 33 (14 by maintainers)
You have to install the Colossal CUDA extension.
CUDA_EXT=1 pip install colossalai