RoseTTAFold2NA: Get errors "NVTX functions not installed. Are you sure you have a CUDA build?" when run RF2NA on CPU.
Hi,
I get the following errors when run RoseTTAFold2NA on CPU. I have replaced the “torch.cuda.amp.autocast” with “torch.amp.autocast” in predict.py in order to overcome another “NVTX functions not installed.” error when running the run_RF2NA.sh.
Seems there is some part of the codes still calling cuda or searching for GPUs? Thank you!
Traceback (most recent call last):
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 376, in <module>
pred.predict(inputs=args.inputs, out_prefix=args.prefix, ffdb=ffdb)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 239, in predict
self._run_model(Ls, msa_orig, ins_orig, t1d, t2d, xyz_t, xyz_t[:,0], alpha_t, "%s_%02d"%(out_prefix, i_trial))
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 299, in _run_model
logit_s, logit_aa_s, logit_pae, init_crds, alpha_prev, _, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model(
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/RoseTTAFoldModel.py", line 104, in forward
msa, pair, xyz, alpha_s, xyzallatom, state = self.simulator(
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 441, in forward
msa_full, pair, xyz, state, alpha = self.extra_block[i_m](msa_full, pair,
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 367, in forward
xyz, state, alpha = self.str2str(msa.float(), pair.float(), xyz.detach().float(), state.float(), idx, top_k=top_k)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
return func(*args, **kwargs)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 234, in forward
shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/SE3_network.py", line 84, in forward
return self.se3(G, node_features, edge_features) #, clamp_d=clamp_d)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/se3_transformer-1.0.0-py3.8.egg/se3_transformer/model/transformer.py", line 163, in forward
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/se3_transformer-1.0.0-py3.8.egg/se3_transformer/model/basis.py", line 166, in get_basis
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/contextlib.py", line 114, in __enter__
return next(self.gen)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 86, in range
range_push(msg.format(*args, **kwargs))
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 28, in range_push
return _nvtx.rangePushA(msg)
File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 9, in _fail
raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?")
RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?
My package versions:
brotlipy==0.7.0
certifi @ file:///croot/certifi_1665076670883/work/certifi
cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
click==8.1.3
colorama @ file:///opt/conda/conda-bld/colorama_1657009087971/work
configparser==5.3.0
cryptography @ file:///croot/cryptography_1665612644927/work
dgl==0.9.1.post1
DLLogger @ git+https://github.com/NVIDIA/dllogger@0540a43971f4a8a16693a9de9de73c1072020769
docker-pycreds==0.4.0
e3nn==0.3.3
gitdb==4.0.9
GitPython==3.1.29
idna @ file:///croot/idna_1666125576474/work
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work
mkl-service==2.4.0
mpmath==1.2.1
networkx @ file:///opt/conda/conda-bld/networkx_1657784097507/work
numpy @ file:///croot/numpy_and_numpy_base_1667233465264/work
opt-einsum==3.3.0
opt-einsum-fx==0.1.4
packaging==21.3
pathtools==0.1.2
promise==2.3
protobuf==4.21.9
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1667885878918/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pynvml==11.0.0
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing==3.0.9
PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
python-dateutil==2.8.2
PyYAML==6.0
requests @ file:///opt/conda/conda-bld/requests_1657734628632/work
scipy==1.9.3
se3-transformer==1.0.0
sentry-sdk==1.11.0
shortuuid==1.0.11
six @ file:///tmp/build/80754af9/six_1644875935023/work
smmap==5.0.0
subprocess32==3.5.4
sympy==1.11.1
torch==1.13.0
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1662214488106/work
typing_extensions @ file:///tmp/abs_ben9emwtky/croots/recipe/typing_extensions_1659638822008/work
urllib3 @ file:///croot/urllib3_1666298941550/work
wandb==0.12.0
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 19
The environment file is missing a dependency for pytorch to see cuda. You have to add/modify lines in the environment file: Specify - pytorch::pytorch=1.12.0 and - pytorch::torchvision=0.13.0 and the issue is going to go away.
Here is my modified conda environment file:
Cheers,
Kamil
For those still struggling with the newest yml file, I think I stumbled upon a solution that may work universally.
Before doing anything, run this:
conda config --set channel_priority flexibleBased on https://pytorch.org/blog/deprecation-cuda-python-support/, I substituted pytorch for pytorch::pytorch=2.0 in the yml file, because just pytorch installs pytorch 2.1.*. So my yml file looks like this:
Then go about with
conda env create -f RF2na-linux.ymlFinally, probably unnecessary to say, but after having run
conda activate RF2NA:cd SE3Transformer pip install --no-cache-dir -r requirements.txt python setup.py installHello, sorry if I wasn’t clear. The program runs on CPU by default. We had trouble getting it to run on GPU, because even though we had an available GPU, it was not detecting it until I found that I need to specify this “export CUDA_VISIBLE_DEVICES=0”. We observed huge speedup on GPU vs CPU (on one example standard size complex, a factor of 25 in runtime when using an NVIDIA A100 GPU vs. 20 CPU)