metaseq: "RuntimeError: torch.distributed is not yet initialized but process group is requested" when trying to run API
❓ Questions and Help
After following setup steps I ran metaseq-api-local
and got this output:
$ metaseq-api-local
Traceback (most recent call last):
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 17, in <module>
from metaseq_internal.constants import LOCAL_SSD, MODEL_SHARED_FOLDER
ModuleNotFoundError: No module named 'metaseq_internal'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 33, in <module>
sys.exit(load_entry_point('metaseq', 'console_scripts', 'metaseq-api-local')())
File "/home/jliu/miniconda3/envs/conda_env_opt/bin/metaseq-api-local", line 25, in importlib_load_entry_point
return next(matches).load()
File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/metadata.py", line 86, in load
module = import_module(match.group('module'))
File "/home/jliu/miniconda3/envs/conda_env_opt/lib/python3.9/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 850, in exec_module
File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq_cli/interactive_hosted.py", line 31, in <module>
from metaseq.service.constants import (
File "/home/jliu/openpretrainedtransformer/metaseq/metaseq/service/constants.py", line 40, in <module>
raise RuntimeError(
RuntimeError: You must set the variables in metaseq.service.constants to launch the API.
Am I missing a step? I tried manually setting LOCAL_SSD, MODEL_SHARED_FOLDER to a new folder I created but then other things failed.
- fairseq Version (e.g., 1.0 or master): followed setup.md
- PyTorch Version (e.g., 1.0) followed setup.md
- OS (e.g., Linux): Ubuntu
- How you installed fairseq (
pip
, source): source - Build command you used (if compiling from source): followed setup.md
- Python version: 3.9.12
- CUDA/cuDNN version: 11.3
- GPU models and configuration: Quadro RTX 5000
- Any other relevant information:
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 23 (5 by maintainers)
Commits related to this issue
- Update constants.py Fixes #23. — committed to facebookresearch/metaseq by stephenroller 2 years ago
- Update constants.py (#24) Fixes #23. — committed to facebookresearch/metaseq by stephenroller 2 years ago
I still see the issue, any resolution?
I met the same problem of “RuntimeError: torch.distributed is not yet initialized but process group is requested”; I just follow the official setup instruction, but install Apex in the end. After finishing all instruction, I run “metaseq-api-local” ,then come up with this error
I am wondering whether the requirment install order would bring this error?
Do you want to fune-turing this model or just run it? If you want to run it, you could use OPT on HuggingFace(transformers), this method will bypasses these issues.
I installed fairscale from source with
as described in setup.md. I’m not sure how to check the version number. Based on fairscale/CHANGELOG.md it seems version 0.4.1 is the most recent version upgrade on this commit.