prefect: Docker agent with server tasks stuck submitted: host.docker.internal connection error

Description

The docker agent deployed locally on the same machine as the docker server fails to run tasks, they get stuck in the submitted state on the server. From the logs, it seems to relate to an inability to connect to host.docker.internal.

[2021-09-13 15:59:17,700] INFO - agent | Deploying flow run 2d52c82a-e890-4f66-a845-7f5b0664ef0d to execution environment...
[2021-09-13 15:59:18,038] INFO - agent | Completed deployment of flow run 2d52c82a-e890-4f66-a845-7f5b0664ef0d
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
    conn = connection.create_connection(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
    raise err
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
    super(HTTPConnection, self).request(method, url, body=body, headers=headers)
  File "/usr/local/lib/python3.8/http/client.py", line 1256, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1302, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1251, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/local/lib/python3.8/http/client.py", line 1011, in _send_output
    self.send(msg)
  File "/usr/local/lib/python3.8/http/client.py", line 951, in send
    self.connect()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
    raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/prefect", line 8, in <module>
    sys.exit(cli())
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 53, in flow_run
    result = client.graphql(query)
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 548, in graphql
    result = self.post(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 451, in post
    response = self._request(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 737, in _request
    response = self._send_request(
  File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 602, in _send_request
    response = session.post(
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
    return self.request('POST', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused'))

There seem to have been similar or related issues in the past: https://github.com/PrefectHQ/server/issues/25 https://github.com/PrefectHQ/prefect/issues/2324 / https://github.com/PrefectHQ/prefect/pull/2328 https://github.com/PrefectHQ/prefect/pull/4714 https://github.com/PrefectHQ/prefect/pull/4777 https://github.com/PrefectHQ/prefect/pull/4809 However, they seemed to indicate they should be resolved in the version I’m using (details below), and are closed

Expected Behavior

I would expect the docker agent to be able to run the flow and communicate this back to the server

Reproduction

Shell:

# shell 1 (first)
pip install prefect==0.15.5
prefect server start
# http://localhost:8080

# shell 2 (second)
prefect backend server
prefect create project "Test"
prefect agent docker start --label docker --show-flow-logs

Run Python script to register flow (third):

import prefect
from prefect import task, Flow
from prefect.run_configs import DockerRun
from prefect.storage import Docker


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("Hello world!")


with Flow(
        "hello-flow",
        storage=Docker(
            image_name="my_testing_img",
        ),
        run_config=DockerRun(
            labels=["docker"],
        )
) as flow:
    hello = hello_task()


if __name__ == '__main__':
    flow.register(project_name='Test')

Trigger flow (fourth):

# shell 3 
prefect run --project Test --name "hello-flow"

Environment

Ubuntu 20.04

❯ prefect diagnostics
{
  "config_overrides": {},
  "env_vars": [],
  "system_information": {
    "platform": "Linux-5.11.0-34-generic-x86_64-with-glibc2.29",
    "prefect_backend": "server",
    "prefect_version": "0.15.5",
    "python_version": "3.8.12"
  }
}
❯ docker version
Client: Docker Engine - Community
 Version:           20.10.8
 API version:       1.41
 Go version:        go1.16.6
 Git commit:        3967b7d
 Built:             Fri Jul 30 19:54:27 2021
 OS/Arch:           linux/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.8
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.16.6
  Git commit:       75249d8
  Built:            Fri Jul 30 19:52:33 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.9
  GitCommit:        e25210fe30a0a703442421b0f60afac609f950a3
 runc:
  Version:          1.0.1
  GitCommit:        v1.0.1-0-g4144b63
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Thanks for taking a look, hopefully this is clear and I haven’t overlooked something, let me know if you need any more detail?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 35 (15 by maintainers)

Most upvoted comments

Yeah I’d like this to work without the expose flag, although it may not be feasible.

Thanks for that note, that’s because it was left out of the docstring https://github.com/PrefectHQ/prefect/pull/4966

Thanks for the thorough issue! This seems to be Ubuntu specific, we couldn’t replicate it on OSX. It appears to be a regression introduced by https://github.com/PrefectHQ/prefect/pull/4821. Adding the --expose flag to your server startup should fix the error, though I’m not sure why the local connection is being blocked. We’ll have to investigate further to determine why localhost isn’t sufficient for the host-gateway connection.