nni: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))

when i run ''nnictl create --config config.yml -p 8140", i get the error:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa090c0d00>: Failed to establish a new connection: [Errno 111] Connection refused'))

Environment:

NNI version:
v2.9
Training service (local|remote|pai|aml|etc):
remote
Client OS:
Server OS (for remote mode only):
Python version:
3.8
PyTorch/TensorFlow version:
PyTorch 1.7.0
Is conda/virtualenv/venv used?:
conda
Is running in Docker?: no

Configuration:

Experiment config (remember to remove secrets!):

trialConcurrency: 2 #trail的并发数,根据GPU数量设置，此值为几就有几个train在同时跑
trainingService:
  platform: local
  gpuIndices: [6,7] # 使用哪几个GPU
  # gpuIndices: [0] # 使用哪几个GPU
  useActiveGpu: True # 默认值false。是否使用已经被其他进程使用的gpu,包括graphical desktop占用的。
  maxTrialNumberPerGpu: 1 #指定1个GPU上最大并发trail的数量,在确保显存达到足以容下任何两个trail时，再设置为2。
trialGpuNumber: 1 # 每个trail所需要的gpu

Search space:

{
    "epochs":{"_type":"choice","_value":[400,500]},
    "lr":{"_type":"quniform","_value":[0.0001,0.0025,0.0005]},
}

Log message:

nnimanager.log:

[2022-09-13 20:23:23] INFO (main) Start NNI manager

dispatcher.log: none
nnictl stdout and stderr: none

How to reproduce it?:

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 15 (5 by maintainers)

Most upvoted comments

This error has happened due to a change in the item of experimentWorkingDirectory in the config.yml. One can cancel the change and maintain the default. However, I can not check the real cause, but I never find the error in version of 2.0.

szhang963 on Sep 18, 2022