nni: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))
when i run ''nnictl create --config config.yml -p 8140", i get the error:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa090c0d00>: Failed to establish a new connection: [Errno 111] Connection refused'))
Environment:
- NNI version:
- v2.9
- Training service (local|remote|pai|aml|etc):
- remote
- Client OS:
- Server OS (for remote mode only):
- Python version:
- 3.8
- PyTorch/TensorFlow version:
- PyTorch 1.7.0
- Is conda/virtualenv/venv used?:
- conda
- Is running in Docker?: no
Configuration:
- Experiment config (remember to remove secrets!):
trialConcurrency: 2 #trail的并发数,根据GPU数量设置,此值为几就有几个train在同时跑
trainingService:
platform: local
gpuIndices: [6,7] # 使用哪几个GPU
# gpuIndices: [0] # 使用哪几个GPU
useActiveGpu: True # 默认值false。是否使用已经被其他进程使用的gpu,包括graphical desktop占用的。
maxTrialNumberPerGpu: 1 #指定1个GPU上最大并发trail的数量,在确保显存达到足以容下任何两个trail时,再设置为2。
trialGpuNumber: 1 # 每个trail所需要的gpu
- Search space:
{
"epochs":{"_type":"choice","_value":[400,500]},
"lr":{"_type":"quniform","_value":[0.0001,0.0025,0.0005]},
}
Log message:
- nnimanager.log:
[2022-09-13 20:23:23] INFO (main) Start NNI manager
- dispatcher.log: none
- nnictl stdout and stderr: none
How to reproduce it?:
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (5 by maintainers)
This error has happened due to a change in the item of
experimentWorkingDirectoryin theconfig.yml. One can cancel the change and maintain the default. However, I can not check the real cause, but I never find the error in version of 2.0.