ColossalAI: [BUG]: colossalai/kernel/cuda_native/csrc/moe_cuda_kernel.cu:5:10: fatal error: cub/cub.cuh: No such file or directory (update: now with more build errors!)

🐛 Describe the bug

Trying to run a finetune torchrun script, get this error. ColossaiAL was built from source as directed, but it still fails.

anon@linuxmint:/media/anon/bighdd/ai/toolbox/training$ ./finetune.bash 
+ export BATCH_SIZE=4
+ BATCH_SIZE=4
+ export MODEL=/media/anon/bighdd/ai/models/opt-350m
+ MODEL=/media/anon/bighdd/ai/models/opt-350m
+ export NUMBER_OF_GPUS=1
+ NUMBER_OF_GPUS=1
+ export OUTPUT_DIR=checkpoints
+ OUTPUT_DIR=checkpoints
++ date +%Y-%m-%d_%H-%M-%S
+ LOG_NAME=2022-12-22_14-15-45
+ export HF_DATASETS_OFFLINE=1
+ HF_DATASETS_OFFLINE=1
+ mkdir -p checkpoints/logs
+ mkdir -p checkpoints/runs
+ torchrun --nproc_per_node 1 --master_port 19198 ./colossalai/run_clm.py --train_file ./data/train.json --learning_rate 2e-5 --checkpointing_steps 64 --mem_cap 0 --model_name_or_path /media/anon/bighdd/ai/models/opt-350m --output_dir checkpoints --per_device_eval_batch_size 4 --per_device_train_batch_size 4
+ tee checkpoints/logs/2022-12-22_14-15-45.log
2022-12-22 14:15:51.339450: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
Colossalai should be built with cuda extension to use the FP16 optimizer
If you want to activate cuda mode for MoE, please install with cuda_ext!
[12/22/22 14:15:54] INFO     colossalai - colossalai - INFO:                                                                              
                             /home/anon/.local/lib/python3.8/site-packages/colossalai/context/parallel_context.py:521 set_device          
                    INFO     colossalai - colossalai - INFO: process rank 0 is bound to device 0                                          
[12/22/22 14:15:55] INFO     colossalai - colossalai - INFO:                                                                              
                             /home/anon/.local/lib/python3.8/site-packages/colossalai/context/parallel_context.py:557 set_seed            
                    INFO     colossalai - colossalai - INFO: initialized seed on rank 0, numpy: 1024, python random: 1024,                
                             ParallelMode.DATA: 1024, ParallelMode.TENSOR: 1024,the default parallel seed is ParallelMode.DATA.           
                    INFO     colossalai - colossalai - INFO: /home/anon/.local/lib/python3.8/site-packages/colossalai/initialize.py:117   
                             launch                                                                                                       
                    INFO     colossalai - colossalai - INFO: Distributed environment is initialized, data parallel size: 1, pipeline      
                             parallel size: 1, tensor parallel size: 1                                                                    
                    INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:309 main                                             
                    INFO     colossalai - colossalai - INFO: Start preparing dataset                                                      
Using custom data configuration default-ced548c04fa8d0c8
Found cached dataset json (/home/anon/.cache/huggingface/datasets/json/default-ced548c04fa8d0c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
100%|██████████| 1/1 [00:00<00:00, 597.82it/s]
Using custom data configuration default-ced548c04fa8d0c8
Found cached dataset json (/home/anon/.cache/huggingface/datasets/json/default-ced548c04fa8d0c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
Using custom data configuration default-ced548c04fa8d0c8
Found cached dataset json (/home/anon/.cache/huggingface/datasets/json/default-ced548c04fa8d0c8/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51)
                    INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:350 main                                             
                    INFO     colossalai - colossalai - INFO: Dataset is prepared                                                          
                    INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:366 main                                             
                    INFO     colossalai - colossalai - INFO: Model config has been created                                                
load model from /media/anon/bighdd/ai/models/opt-350m
                    INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:373 main                                             
                    INFO     colossalai - colossalai - INFO: GPT2Tokenizer has been created                                               
                    INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:388 main                                             
                    INFO     colossalai - colossalai - INFO: Finetune a pre-trained model                                                 
[12/22/22 14:16:04] INFO     colossalai - ProcessGroup - INFO:                                                                            
                             /home/anon/.local/lib/python3.8/site-packages/colossalai/tensor/process_group.py:24 get                      
                    INFO     colossalai - ProcessGroup - INFO: NCCL initialize ProcessGroup on [0]                                        
[12/22/22 14:16:07] INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:400 main                                             
                    INFO     colossalai - colossalai - INFO: using Colossal-AI version 0.1.13                                             
searching chunk configuration is completed in 0.67 s.
used number: 315.85 MB, wasted number: 3.01 MB
total wasted percentage is 0.95%
/home/anon/.local/lib/python3.8/site-packages/colossalai/gemini/chunk/chunk.py:40: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  return tensor.storage().size() == 0
/home/anon/.local/lib/python3.8/site-packages/colossalai/gemini/chunk/chunk.py:45: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  tensor.storage().resize_(0)
[12/22/22 14:16:09] INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:415 main                                             
                    INFO     colossalai - colossalai - INFO: GeminiDDP has been created                                                   
Running tokenizer on dataset: 100%|██████████| 10/10 [00:23<00:00,  2.34s/ba]
Running tokenizer on dataset: 100%|██████████| 1/1 [00:01<00:00,  1.18s/ba]
[12/22/22 14:16:37] WARNING  colossalai - colossalai - WARNING: ./colossalai/run_clm.py:444 main                                          
                    WARNING  colossalai - colossalai - WARNING: The tokenizer picked seems to have a very large `model_max_length`        
                             (1000000000000000019884624838656). Picking 1024 instead. You can change that default value by passing        
                             --block_size xxx.                                                                                            
Grouping texts in chunks of 1024: 100%|██████████| 10/10 [00:05<00:00,  1.92ba/s]
Grouping texts in chunks of 1024: 100%|██████████| 1/1 [00:00<00:00,  3.61ba/s]
[12/22/22 14:16:42] INFO     colossalai - colossalai - INFO: ./colossalai/run_clm.py:503 main                                             
                    INFO     colossalai - colossalai - INFO: Dataloaders have been created                                                
/home/anon/.local/lib/python3.8/site-packages/colossalai/tensor/colo_tensor.py:182: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  ret = func(*args, **kwargs)
/home/anon/.local/lib/python3.8/site-packages/colossalai/nn/optimizer/nvme_optimizer.py:55: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.  To access UntypedStorage directly, use tensor._storage() instead of tensor.storage()
  numel += p.storage().size()
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/anon/.local/lib/python3.8/site-packages/colossalai/nn/optimizer/hybrid_adam.py:80 in       │
│ __init__                                                                                         │
│                                                                                                  │
│    77 │   │   super(HybridAdam, self).__init__(model_params, default_args, nvme_offload_fracti   │
│    78 │   │   self.adamw_mode = adamw_mode                                                       │
│    79 │   │   try:                                                                               │
│ ❱  80 │   │   │   import colossalai._C.cpu_optim                                                 │
│    81 │   │   │   import colossalai._C.fused_optim                                               │
│    82 │   │   except ImportError:                                                                │
│    83 │   │   │   raise ImportError('Please install colossalai from source code to use HybridA   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ModuleNotFoundError: No module named 'colossalai._C.cpu_optim'

During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /media/anon/bighdd/ai/toolbox/training/./colossalai/run_clm.py:643 in <module>                    │
│                                                                                                  │
│   640                                                                                            │
│   641                                                                                            │
│   642 if __name__ == "__main__":                                                                 │
│ ❱ 643 │   main()                                                                                 │
│   644                                                                                            │
│                                                                                                  │
│ /media/anon/bighdd/ai/toolbox/training/./colossalai/run_clm.py:519 in main                        │
│                                                                                                  │
│   516 │   │   },                                                                                 │
│   517 │   ]                                                                                      │
│   518 │                                                                                          │
│ ❱ 519 │   optimizer = HybridAdam(optimizer_grouped_parameters, lr=args.learning_rate)            │
│   520 │   optimizer = ZeroOptimizer(optimizer, model, initial_scale=2**14)                       │
│   521 │                                                                                          │
│   522 │   # Scheduler and math around the number of training steps.                              │
│                                                                                                  │
│ /home/anon/.local/lib/python3.8/site-packages/colossalai/nn/optimizer/hybrid_adam.py:83 in       │
│ __init__                                                                                         │
│                                                                                                  │
│    80 │   │   │   import colossalai._C.cpu_optim                                                 │
│    81 │   │   │   import colossalai._C.fused_optim                                               │
│    82 │   │   except ImportError:                                                                │
│ ❱  83 │   │   │   raise ImportError('Please install colossalai from source code to use HybridA   │
│    84 │   │                                                                                      │
│    85 │   │   self.cpu_adam_op = colossalai._C.cpu_optim.CPUAdamOptimizer(lr, betas[0], betas[   │
│    86 │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   │   adamw_mode)            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ImportError: Please install colossalai from source code to use HybridAdam
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 206247) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/home/anon/.local/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/anon/.local/lib/python3.8/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
    return f(*args, **kwargs)
  File "/home/anon/.local/lib/python3.8/site-packages/torch/distributed/run.py", line 762, in main
    run(args)
  File "/home/anon/.local/lib/python3.8/site-packages/torch/distributed/run.py", line 753, in run
    elastic_launch(
  File "/home/anon/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 132, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/anon/.local/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
./colossalai/run_clm.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2022-12-22_14:16:47
  host      : linuxmint
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 206247)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Environment

Python 3.8.10 torch: 2.0.0.dev20221215+cu117 colossalai-0.1.13 Nvidia 3060 12GB NVIDIA-SMI 525.60.11 Driver Version: 525.60.11 CUDA Version: 12.0 Cuda compilation tools, release 10.1, V10.1.243

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 15 (6 by maintainers)

Most upvoted comments

Downgraded to pytorch 1.10 with CUDA 10.2 ColossalAI build failure

  colossalai/kernel/cuda_native/csrc/moe_cuda_kernel.cu:5:10: fatal error: cub/cub.cuh: No such file or directory
   #include <cub/cub.cuh>
            ^~~~~~~~~~~~~
  compilation terminated.
  error: command '/usr/bin/nvcc' failed with exit code 1
  error: subprocess-exited-with-error
  
  × Running setup.py install for colossalai did not run successfully.
  │ exit code: 1
  ╰─> See above for output.
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  full command: /home/anon/mambaforge/envs/aibox/bin/python3.8 -u -c '
  exec(compile('"'"''"'"''"'"'
  # This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
  #
  # - It imports setuptools before invoking setup.py, to enable projects that directly
  #   import from `distutils.core` to work with newer packaging standards.
  # - It provides a clear error message when setuptools is not installed.
  # - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
  #   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
  #     manifest_maker: standard file '"'"'-c'"'"' not found".
  # - It generates a shim setup.py, for handling setup.cfg-only projects.
  import os, sys, tokenize
  
  try:
      import setuptools
  except ImportError as error:
      print(
          "ERROR: Can not execute `setup.py` since setuptools is not available in "
          "the build environment.",
          file=sys.stderr,
      )
      sys.exit(1)
  
  __file__ = %r
  sys.argv[0] = __file__
  
  if os.path.exists(__file__):
      filename = __file__
      with tokenize.open(__file__) as f:
          setup_py_code = f.read()
  else:
      filename = "<auto-generated setuptools caller>"
      setup_py_code = "from setuptools import setup; setup()"
  
  exec(compile(setup_py_code, filename, "exec"))
  '"'"''"'"''"'"' % ('"'"'/media/anon/bighdd/ai/ColossalAI/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' install --record /tmp/pip-record-vm00d9fx/install-record.txt --single-version-externally-managed --compile --install-headers /home/anon/mambaforge/envs/aibox/include/python3.8/colossalai
  cwd: /media/anon/bighdd/ai/ColossalAI/
  Running setup.py install for colossalai: finished with status 'error'
  ERROR: Can't roll back colossalai; was not uninstalled
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> colossalai

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

xznhj8129 on Dec 22, 2022