parsl: Ad Hoc Cluster configuration not working on Beowulf cluster

Describe the bug I built a Beowulf cluster and tried configuring it as an ad hoc cluster, following parsl instructions. However, when I ran the python-parsl program, it returned the following error.

Traceback (most recent call last):
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connection.py", line 157, in _new_conn
    (self._dns_host, self.port), self.timeout, **extra_kw
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/util/connection.py", line 61, in create_connection
    for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
  File "/usr/lib/python3.5/socket.py", line 732, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 672, in urlopen
    chunked=chunked,
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 376, in _make_request
    self._validate_conn(conn)
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 994, in _validate_conn
    conn.connect()
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connection.py", line 334, in connect
    conn = self._new_conn()
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connection.py", line 169, in _new_conn
    self, "Failed to establish a new connection: %s" % e
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f21ee8216d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 720, in urlopen
    method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
  File "/home/mpiuser/.local/lib/python3.5/site-packages/urllib3/util/retry.py", line 436, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.ipify.org', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f21ee8216d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "parallel_wf.py", line 12, in <module>
    from remote_htex import remote_htex
  File "/home/mpiuser/Downloads/parallel-parsl-workflow/remote_htex.py", line 23, in <module>
    address=address_by_query(),
  File "/home/mpiuser/.local/lib/python3.5/site-packages/parsl/addresses.py", line 21, in address_by_query
    addr = requests.get('https://api.ipify.org').text
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/sessions.py", line 646, in send
    r = adapter.send(request, **kwargs)
  File "/home/mpiuser/.local/lib/python3.5/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='api.ipify.org', port=443): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f21ee8216d8>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',)) 

To Reproduce Steps to reproduce the behavior, for e.g:

  1. Setup Parsl 0.9.0 with Python 3.5.2 on a beowulf cluster
  2. Run a parsl-python script
  3. Wait
  4. See error

Expected behavior Run correctly, in lesser time, without generating the following error.

/home/mpiuser/Downloads//home/mpiuser/Downloads/parallel-parsl-workflow//parsl.auto.1575015988.2083118.sh: line 2: process_worker_pool.py: command not found

Environment

  • ubuntu 16.04
  • Python 3.5.2
  • Parsl 0.9.0

Distributed Environment

  • Where are you running the Parsl script from ? Login node
  • Where do you need the workers to run ? Login node and Compute nodes

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15

Most upvoted comments

Try setting this parameter in your parsl config (in place of the existing worker_init=''). It is an attempt to replicate what happens when you do a normal SSH.

worker_init="""
source /etc/profile
source ~/.profile
"""