prefect: Docker agent with server tasks stuck submitted: host.docker.internal connection error
Description
The docker agent deployed locally on the same machine as the docker server fails to run tasks, they get stuck in the submitted state on the server. From the logs, it seems to relate to an inability to connect to host.docker.internal
.
[2021-09-13 15:59:17,700] INFO - agent | Deploying flow run 2d52c82a-e890-4f66-a845-7f5b0664ef0d to execution environment...
[2021-09-13 15:59:18,038] INFO - agent | Completed deployment of flow run 2d52c82a-e890-4f66-a845-7f5b0664ef0d
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 169, in _new_conn
conn = connection.create_connection(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 96, in create_connection
raise err
File "/usr/local/lib/python3.8/site-packages/urllib3/util/connection.py", line 86, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 394, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 234, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "/usr/local/lib/python3.8/http/client.py", line 1256, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1302, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1251, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/lib/python3.8/http/client.py", line 1011, in _send_output
self.send(msg)
File "/usr/local/lib/python3.8/http/client.py", line 951, in send
self.connect()
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 200, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.8/site-packages/urllib3/connection.py", line 181, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/usr/local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "/usr/local/lib/python3.8/site-packages/urllib3/util/retry.py", line 574, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/prefect", line 8, in <module>
sys.exit(cli())
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 829, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 782, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1259, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 1066, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.8/site-packages/click/core.py", line 610, in invoke
return callback(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/prefect/cli/execute.py", line 53, in flow_run
result = client.graphql(query)
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 548, in graphql
result = self.post(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 451, in post
response = self._request(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 737, in _request
response = self._send_request(
File "/usr/local/lib/python3.8/site-packages/prefect/client/client.py", line 602, in _send_request
response = session.post(
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 590, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.8/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='host.docker.internal', port=4200): Max retries exceeded with url: / (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f25672ab070>: Failed to establish a new connection: [Errno 111] Connection refused'))
There seem to have been similar or related issues in the past: https://github.com/PrefectHQ/server/issues/25 https://github.com/PrefectHQ/prefect/issues/2324 / https://github.com/PrefectHQ/prefect/pull/2328 https://github.com/PrefectHQ/prefect/pull/4714 https://github.com/PrefectHQ/prefect/pull/4777 https://github.com/PrefectHQ/prefect/pull/4809 However, they seemed to indicate they should be resolved in the version I’m using (details below), and are closed
Expected Behavior
I would expect the docker agent to be able to run the flow and communicate this back to the server
Reproduction
Shell:
# shell 1 (first)
pip install prefect==0.15.5
prefect server start
# http://localhost:8080
# shell 2 (second)
prefect backend server
prefect create project "Test"
prefect agent docker start --label docker --show-flow-logs
Run Python script to register flow (third):
import prefect
from prefect import task, Flow
from prefect.run_configs import DockerRun
from prefect.storage import Docker
@task
def hello_task():
logger = prefect.context.get("logger")
logger.info("Hello world!")
with Flow(
"hello-flow",
storage=Docker(
image_name="my_testing_img",
),
run_config=DockerRun(
labels=["docker"],
)
) as flow:
hello = hello_task()
if __name__ == '__main__':
flow.register(project_name='Test')
Trigger flow (fourth):
# shell 3
prefect run --project Test --name "hello-flow"
Environment
Ubuntu 20.04
❯ prefect diagnostics
{
"config_overrides": {},
"env_vars": [],
"system_information": {
"platform": "Linux-5.11.0-34-generic-x86_64-with-glibc2.29",
"prefect_backend": "server",
"prefect_version": "0.15.5",
"python_version": "3.8.12"
}
}
❯ docker version
Client: Docker Engine - Community
Version: 20.10.8
API version: 1.41
Go version: go1.16.6
Git commit: 3967b7d
Built: Fri Jul 30 19:54:27 2021
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.8
API version: 1.41 (minimum version 1.12)
Go version: go1.16.6
Git commit: 75249d8
Built: Fri Jul 30 19:52:33 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.9
GitCommit: e25210fe30a0a703442421b0f60afac609f950a3
runc:
Version: 1.0.1
GitCommit: v1.0.1-0-g4144b63
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Thanks for taking a look, hopefully this is clear and I haven’t overlooked something, let me know if you need any more detail?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 35 (15 by maintainers)
Yeah I’d like this to work without the expose flag, although it may not be feasible.
Thanks for that note, that’s because it was left out of the docstring https://github.com/PrefectHQ/prefect/pull/4966
Thanks for the thorough issue! This seems to be Ubuntu specific, we couldn’t replicate it on OSX. It appears to be a regression introduced by https://github.com/PrefectHQ/prefect/pull/4821. Adding the
--expose
flag to your server startup should fix the error, though I’m not sure why the local connection is being blocked. We’ll have to investigate further to determine why localhost isn’t sufficient for the host-gateway connection.