nni: requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused'))

when i run ''nnictl create --config config.yml -p 8140", i get the error:

requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=8140): Max retries exceeded with url: /api/v1/nni/check-status (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7ffa090c0d00>: Failed to establish a new connection: [Errno 111] Connection refused'))

Environment:

  • NNI version:
  • v2.9
  • Training service (local|remote|pai|aml|etc):
  • remote
  • Client OS:
  • Server OS (for remote mode only):
  • Python version:
  • 3.8
  • PyTorch/TensorFlow version:
  • PyTorch 1.7.0
  • Is conda/virtualenv/venv used?:
  • conda
  • Is running in Docker?: no

Configuration:

  • Experiment config (remember to remove secrets!):
trialConcurrency: 2 #trail的并发数,根据GPU数量设置,此值为几就有几个train在同时跑
trainingService:
  platform: local
  gpuIndices: [6,7] # 使用哪几个GPU
  # gpuIndices: [0] # 使用哪几个GPU
  useActiveGpu: True # 默认值false。是否使用已经被其他进程使用的gpu,包括graphical desktop占用的。
  maxTrialNumberPerGpu: 1 #指定1个GPU上最大并发trail的数量,在确保显存达到足以容下任何两个trail时,再设置为2。
trialGpuNumber: 1 # 每个trail所需要的gpu
  • Search space:
{
    "epochs":{"_type":"choice","_value":[400,500]},
    "lr":{"_type":"quniform","_value":[0.0001,0.0025,0.0005]},
}

Log message:

  • nnimanager.log:
[2022-09-13 20:23:23] INFO (main) Start NNI manager
  • dispatcher.log: none
  • nnictl stdout and stderr: none

How to reproduce it?:

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (5 by maintainers)

Most upvoted comments

This error has happened due to a change in the item of experimentWorkingDirectory in the config.yml. One can cancel the change and maintain the default. However, I can not check the real cause, but I never find the error in version of 2.0.