distribution: Intermittent "connection reset by peer" while pushing image
We’re running docker-registry-v2 on an AWS EC2 instance - backed by an EBS volume (switched from S3 since we thought that might be the underlying issue).
From another AWS EC2 instance we’re running a Bamboo CI agent which are building docker images and pushes them to our docker-registry.
Several times each day we get failed builds caused by docker push getting connection reset by peer.
[info] time="2015-07-30T10:00:31+02:00" level=fatal msg="Error pushing to registry: Put https://docker-registry.**.**/v2/userhq/blobs/uploads/e60ca766-eb26-4adf-8cdb-fe7a127e3e4c?_state=a7vvRPLZLleaqCBw4xGmaxJZ-Z0Jc0SsFUUtcrnlqft7Ik5hbWUiOiJ1c2VyaHEiLCJVVUlEIjoiZTYwY2E3NjYtZWIyNi00YWRmLThjZGItZmU3YTEyN2UzZTRjIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE1LTA3LTMwVDA4OjAwOjI2LjA0MzcwNzk2WiJ9&digest=sha256%3A1407c3b1319f21131f9da23c859ac406d2ae1051190611046c1666fc86dc5376: read tcp 1.2.3.4:443: connection reset by peer"
Configuration and host information below, how can I debug this issue?
The docker registry is fronted by nginx configured as below:
user nginx;
worker_processes 1;
events {
worker_connections 1024;
use epoll;
multi_accept on;
}
error_log /var/log/nginx/error.log warn;
pid /var/run/nginx.pid;
http {
include /etc/nginx/mime.types;
default_type application/octet-stream;
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for"';
access_log /var/log/nginx/access.log main;
sendfile on;
keepalive_timeout 65;
ssl_session_timeout 10m;
ssl_certificate /etc/nginx/ssl/server.crt;
ssl_certificate_key /etc/nginx/ssl/server.key;
ssl_ciphers "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:10m;
add_header Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
add_header X-Frame-Options DENY;
add_header X-Content-Type-Options nosniff;
ssl_session_tickets off;
ssl_stapling on;
ssl_stapling_verify on;
server {
listen 443 ssl;
server_name myregistrydomain.com;
client_max_body_size 0;
chunked_transfer_encoding on;
client_body_buffer_size 100m;
location /v2/ {
# Do not allow connections from docker 1.5 and earlier
# docker pre-1.6.0 did not properly set the user agent on ping, catch "Go *" user agents
if ($http_user_agent ~ "^(docker\/1\.(3|4|5(?!\.[0-9]-dev))|Go ).*$" ) {
return 404;
}
auth_basic "registry.localhost";
auth_basic_user_file /etc/nginx/registry.htpasswd;
add_header 'Docker-Distribution-Api-Version' 'registry/2.0' always;
proxy_pass http://docker-registry:5000;
proxy_set_header Host $http_host; # required for docker client's sake
proxy_set_header X-Real-IP $remote_addr; # pass on real client's IP
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 900;
}
}
}
Docker registry configuration:
version: 0.1
log:
level: info
formatter: text
fields:
service: registry
environment: production
storage:
filesystem:
rootdirectory: /var/
cache:
layerinfo: redis
maintenance:
uploadpurging:
enabled: true
age: 72h
interval: 8h
dryrun: false
reporting:
newrelic:
licensekey: xxxxx
name: docker-registry
verbose: false
http:
addr: :5000
redis:
addr: cache:6379
db: 0
dialtimeout: 10ms
readtimeout: 10ms
writetimeout: 10ms
pool:
maxidle: 16
maxactive: 64
idletimeout: 300s
docker compose is used to start it all:
cache:
image: redis
nginx:
build: ./nginx
links:
- registry:docker-registry
ports:
- 443:443
registry:
build: ./registry
volumes:
- /data/docker:/var/docker
links:
- cache
Docker registry host:
$ docker info
Containers: 3
Images: 73
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 79
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-172-31-24-143
ID: OTJI:CTOH:DLAP:JVAG:RFEP:4VWV:RR2U:SMJL:E5LU:PYOZ:CYMF:DRBA
WARNING: No swap limit support
ubuntu@ip-172-31-24-143:~$ docker info
Containers: 3
Images: 73
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 79
Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-172-31-24-143
ID: OTJI:CTOH:DLAP:JVAG:RFEP:4VWV:RR2U:SMJL:E5LU:PYOZ:CYMF:DRBA
WARNING: No swap limit support
$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64
Bamboo CI host:
$ docker info
Containers: 0
Images: 106
Storage Driver: aufs
Root Dir: /var/lib/docker/aufs
Backing Filesystem: extfs
Dirs: 106
Dirperm1 Supported: false
Execution Driver: native-0.2
Kernel Version: 3.13.0-52-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 14.69 GiB
Name: ip-172-31-9-199
ID: ZTUG:L4YS:6JUT:KLH4:4PPE:5CGO:GBHD:4UI4:IA6F:OI3B:T5GY:7DX2
WARNING: No swap limit support
$ docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2
OS/Arch (server): linux/amd64
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 81 (16 by maintainers)
I hit the issue when I did a
docker pushon an EC2 instance to a registry running on the same instance, using the external-facing IP address. Using the internal IP did not trigger the problem. A large upload would generally fail after about 8 seconds. It was very easy to reproduce.I collected a tcpdump to see what was happening. At the moment the upload failed, the EC2 instance was receiving a packet that was very far out of sequence. Its sequence number and timestamp were several seconds behind the actual TCP stream. Interestingly, this did not seem to be a retransmit of a previously-sent packet. Presumably this packet is generated within AWS’ infrastructure.
Normally a packet like this should be treated as a spurious retransmit and ignored, but for some reason it was causing the local host to generate a RST packet and kill the connection. Given the anecdote in https://github.com/docker/distribution/issues/785#issuecomment-183338454 that running the registry container in host networking mode works around the issue, I suspected this had something to do with how Docker bridge networking works.
When operating in bridged mode, Docker creates some iptables rules to perform NAT between the exposed address/port and the container’s internal address/port. I had a look at Linux’s NAT implementation, which builds on top of
nf_conntrackfor connection tracking.nf_conntrackhas a state machine that tracks connection state. Ifnf_conntrackbelieves its state is out of sync with the actual connection, it treats incoming packets as invalid. One of the checks is the tcp_window function, which rejects packets outside the TCP window. I believe this is the check that is failing.nf_conntrackhas a “be liberal” flag that accepts these packets as valid. Sure enough, after running:…I haven’t been able to trigger the issue anymore.
If this is indeed a successful workaround, should we consider having Docker Engine switch on that flag by default?
cc @tonyhb @dmp42 @stevvooe @mrjana
Filed https://github.com/docker/libnetwork/issues/1090. Will also reach out to AWS with our conclusions.
@aaronlehmann @mrjana Bravo!
@mrjana: I think I get it now. When conntrack treats a packet like this as “invalid”, it doesn’t associate it with the flow that its tuple corresponds to. Thus, the packet doesn’t get rewritten by the NAT rule, and ends up being handled as if it was part of a connection to the host’s actual IP address. The host sees that it doesn’t have a matching flow, and (correctly) sends a RST packet.
I found I can also work around this by adding a rule to the
INPUTchain that drops invalid packets:This prevents the packet from being interpreted as destined to the pre-NAT IP address, and prevents the RST from being generated.