metaseq: "RuntimeError: torch.distributed is not yet initialized but process group is requested" when trying to run API

❓ Questions and Help

After following setup steps I ran metaseq-api-local and got this output:

$ metaseq-api-local
Traceback (most recent call last):
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 17, in <module>
    from metaseq_internal.constants import LOCAL_SSD, MODEL_SHARED_FOLDER
ModuleNotFoundError: No module named 'metaseq_internal'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 33, in <module>
    sys.exit(load_entry_point('metaseq', 'console_scripts', 'metaseq-api-local')())
  File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 25, in importlib_load_entry_point
    return next(matches).load()
  File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/metadata.py", line 86, in load
    module = import_module(match.group('module'))
  File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq_cli/interactive_hosted.py", line 31, in <module>
    from metaseq.service.constants import (
  File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 40, in <module>
    raise RuntimeError(
RuntimeError: You must set the variables in metaseq.service.constants to launch the API.

Am I missing a step? I tried manually setting LOCAL_SSD, MODEL_SHARED_FOLDER to a new folder I created but then other things failed.

  • fairseq Version (e.g., 1.0 or master): followed setup.md
  • PyTorch Version (e.g., 1.0) followed setup.md
  • OS (e.g., Linux): Ubuntu
  • How you installed fairseq (pip, source): source
  • Build command you used (if compiling from source): followed setup.md
  • Python version: 3.9.12
  • CUDA/cuDNN version: 11.3
  • GPU models and configuration: Quadro RTX 5000
  • Any other relevant information:

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 23 (5 by maintainers)

Commits related to this issue

Most upvoted comments

I still see the issue, any resolution?

This is so strange. Can anyone provide the command they are running?

I met the same problem of “RuntimeError: torch.distributed is not yet initialized but process group is requested”; I just follow the official setup instruction, but install Apex in the end. After finishing all instruction, I run “metaseq-api-local” ,then come up with this error

I am wondering whether the requirment install order would bring this error?

have you resolved it? I got same error too

Do you want to fune-turing this model or just run it? If you want to run it, you could use OPT on HuggingFace(transformers), this method will bypasses these issues.

I installed fairscale from source with

git clone https://github.com/facebookresearch/fairscale.git
cd fairscale
git checkout prefetch_fsdp_params_simple
pip3 install -e .

as described in setup.md. I’m not sure how to check the version number. Based on fairscale/CHANGELOG.md it seems version 0.4.1 is the most recent version upgrade on this commit.