RoseTTAFold2NA: Get errors "NVTX functions not installed. Are you sure you have a CUDA build?" when run RF2NA on CPU.

Hi,

I get the following errors when run RoseTTAFold2NA on CPU. I have replaced the “torch.cuda.amp.autocast” with “torch.amp.autocast” in predict.py in order to overcome another “NVTX functions not installed.” error when running the run_RF2NA.sh.

Seems there is some part of the codes still calling cuda or searching for GPUs? Thank you!

Traceback (most recent call last):
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 376, in <module>
    pred.predict(inputs=args.inputs, out_prefix=args.prefix, ffdb=ffdb)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 239, in predict
    self._run_model(Ls, msa_orig, ins_orig, t1d, t2d, xyz_t, xyz_t[:,0], alpha_t, "%s_%02d"%(out_prefix, i_trial))
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/predict.py", line 299, in _run_model
    logit_s, logit_aa_s, logit_pae, init_crds, alpha_prev, _, pred_lddt_binned, msa_prev, pair_prev, state_prev = self.model(
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/RoseTTAFoldModel.py", line 104, in forward
    msa, pair, xyz, alpha_s, xyzallatom, state = self.simulator(
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 441, in forward
    msa_full, pair, xyz, state, alpha = self.extra_block[i_m](msa_full, pair,
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 367, in forward
    xyz, state, alpha = self.str2str(msa.float(), pair.float(), xyz.detach().float(), state.float(), idx, top_k=top_k)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/amp/autocast_mode.py", line 14, in decorate_autocast
    return func(*args, **kwargs)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/Track_module.py", line 234, in forward
    shift = self.se3(G, node.reshape(B*L, -1, 1), l1_feats, edge_feats)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/expanse/lustre/projects/ddp398/wjin/software/RoseTTAFold2NA/network/SE3_network.py", line 84, in forward
    return self.se3(G, node_features, edge_features) #, clamp_d=clamp_d)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/se3_transformer-1.0.0-py3.8.egg/se3_transformer/model/transformer.py", line 163, in forward
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/se3_transformer-1.0.0-py3.8.egg/se3_transformer/model/basis.py", line 166, in get_basis
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/contextlib.py", line 114, in __enter__
    return next(self.gen)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 86, in range
    range_push(msg.format(*args, **kwargs))
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 28, in range_push
    return _nvtx.rangePushA(msg)
  File "/home/wjin/data/anaconda3/envs/RF2NA/lib/python3.8/site-packages/torch/cuda/nvtx.py", line 9, in _fail
    raise RuntimeError("NVTX functions not installed. Are you sure you have a CUDA build?")
RuntimeError: NVTX functions not installed. Are you sure you have a CUDA build?

My package versions:

brotlipy==0.7.0
certifi @ file:///croot/certifi_1665076670883/work/certifi
cffi @ file:///tmp/abs_98z5h56wf8/croots/recipe/cffi_1659598650955/work
charset-normalizer @ file:///tmp/build/80754af9/charset-normalizer_1630003229654/work
click==8.1.3
colorama @ file:///opt/conda/conda-bld/colorama_1657009087971/work
configparser==5.3.0
cryptography @ file:///croot/cryptography_1665612644927/work
dgl==0.9.1.post1
DLLogger @ git+https://github.com/NVIDIA/dllogger@0540a43971f4a8a16693a9de9de73c1072020769
docker-pycreds==0.4.0
e3nn==0.3.3
gitdb==4.0.9
GitPython==3.1.29
idna @ file:///croot/idna_1666125576474/work
mkl-fft==1.3.1
mkl-random @ file:///tmp/build/80754af9/mkl_random_1626186064646/work
mkl-service==2.4.0
mpmath==1.2.1
networkx @ file:///opt/conda/conda-bld/networkx_1657784097507/work
numpy @ file:///croot/numpy_and_numpy_base_1667233465264/work
opt-einsum==3.3.0
opt-einsum-fx==0.1.4
packaging==21.3
pathtools==0.1.2
promise==2.3
protobuf==4.21.9
psutil @ file:///home/conda/feedstock_root/build_artifacts/psutil_1667885878918/work
pycparser @ file:///tmp/build/80754af9/pycparser_1636541352034/work
pynvml==11.0.0
pyOpenSSL @ file:///opt/conda/conda-bld/pyopenssl_1643788558760/work
pyparsing==3.0.9
PySocks @ file:///tmp/build/80754af9/pysocks_1605305779399/work
python-dateutil==2.8.2
PyYAML==6.0
requests @ file:///opt/conda/conda-bld/requests_1657734628632/work
scipy==1.9.3
se3-transformer==1.0.0
sentry-sdk==1.11.0
shortuuid==1.0.11
six @ file:///tmp/build/80754af9/six_1644875935023/work
smmap==5.0.0
subprocess32==3.5.4
sympy==1.11.1
torch==1.13.0
tqdm @ file:///home/conda/feedstock_root/build_artifacts/tqdm_1662214488106/work
typing_extensions @ file:///tmp/abs_ben9emwtky/croots/recipe/typing_extensions_1659638822008/work
urllib3 @ file:///croot/urllib3_1666298941550/work
wandb==0.12.0

About this issue

Original URL
State: open
Created 2 years ago
Comments: 19

Most upvoted comments

The environment file is missing a dependency for pytorch to see cuda. You have to add/modify lines in the environment file: Specify - pytorch::pytorch=1.12.0 and - pytorch::torchvision=0.13.0 and the issue is going to go away.

Here is my modified conda environment file:

cat > RF2.yaml <<EOF
name: RF2NA
channels:
  - defaults
dependencies:
  - python=3.8
  - pytorch::pytorch=1.12.0
  - pytorch::torchvision=0.13.0
  - requests
  - conda-forge::psutil
  - conda-forge::cudatoolkit=11.3
  - conda-forge::tqdm
  - dglteam::dgl-cuda11.3
  - bioconda::mafft
  - bioconda::hhsuite
  - bioconda::blast
  - bioconda::hmmer>=3.3
  - bioconda::infernal
  - bioconda::cd-hit
  - bioconda::csblast
  - biocore::psipred=4.01
  - biocore::blast-legacy=2.2.26
EOF

conda env create -f RF2.yaml

Cheers,

Kamil

kcygan on Jan 25, 2023

For those still struggling with the newest yml file, I think I stumbled upon a solution that may work universally.

Before doing anything, run this:

conda config --set channel_priority flexible

Based on https://pytorch.org/blog/deprecation-cuda-python-support/, I substituted pytorch for pytorch::pytorch=2.0 in the yml file, because just pytorch installs pytorch 2.1.*. So my yml file looks like this:

name: RF2NA
channels:
  - pytorch
  - nvidia
  - defaults
  - conda-forge
dependencies:
  - python=3.10
  - pip
  - pytorch::pytorch=2.0
  - requests
  - pytorch-cuda=11.7
  - dglteam/label/cu117::dgl
  - pyg::pyg
  - bioconda::mafft
  - bioconda::hhsuite
  - bioconda::blast
  - bioconda::hmmer>=3.3
  - bioconda::infernal
  - bioconda::cd-hit
  - bioconda::csblast
  - pip:
    - psutil
    - tqdm

Then go about with conda env create -f RF2na-linux.yml

Finally, probably unnecessary to say, but after having run conda activate RF2NA:

cd SE3Transformer pip install --no-cache-dir -r requirements.txt python setup.py install

stianale on Dec 12, 2023

Hello, sorry if I wasn’t clear. The program runs on CPU by default. We had trouble getting it to run on GPU, because even though we had an available GPU, it was not detecting it until I found that I need to specify this “export CUDA_VISIBLE_DEVICES=0”. We observed huge speedup on GPU vs CPU (on one example standard size complex, a factor of 25 in runtime when using an NVIDIA A100 GPU vs. 20 CPU)

jessica-andreani on Dec 11, 2023