ray: Can't forward task to worker

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): centos 7
  • Ray installed from (source or binary): binary
  • Ray version: 0.7.2 and 0.8
  • Python version: 3.7
  • Exact command to reproduce:

Describe the problem

I have installed three nodes cluster followed by the steps of Manual Cluster Setup. However, it can’t forward task to workers. The raylet.err as follows:

WARNING: Logging before InitGoogleLogging() is written to STDERR
I0718 13:23:43.323923 41069 stats.h:48] Succeeded to initialize stats: exporter address is 127.0.0.1:8888
I0718 13:23:43.327697 41069 grpc_server.cc:26] ObjectManager server started, listening on port 36019.
I0718 13:23:43.331595 41069 grpc_server.cc:26] NodeManager server started, listening on port 35998.
I0718 13:29:14.056089 41069 node_manager.cc:2179] Failed to forward task ec3d4eacef369aaf0083f381191cb022 to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.056623 41069 node_manager.cc:2179] Failed to forward task cde20bbe70ee98699efd250aaab06d7a to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.068112 41069 node_manager.cc:2179] Failed to forward task 564ba1584451f50dde1b7da70951b856 to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.068373 41069 node_manager.cc:2179] Failed to forward task 81a76cf75730da42983a241c7303b5e4 to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.068522 41069 node_manager.cc:2179] Failed to forward task 536a8c0a9d7217909c2504ab0a04c39b to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.068678 41069 node_manager.cc:2179] Failed to forward task 96b13c5b95f16bd8a0d5f72743c870e7 to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608
I0718 13:29:14.068814 41069 node_manager.cc:2179] Failed to forward task be6f339d5e2325b53df82abd0a9cd393 to node manager 9e2dc91c6fb9e5785d3010257af8f0bdc241e608

Source code / logs

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19 (10 by maintainers)

Most upvoted comments

@NikEyX I was facing the same issue on Ubuntu 16.04.2 LTS with manually setting up the cluster. Also, I can reproduce the error with your code snippet.