distribution: Intermittent "connection reset by peer" while pushing image

We’re running docker-registry-v2 on an AWS EC2 instance - backed by an EBS volume (switched from S3 since we thought that might be the underlying issue). From another AWS EC2 instance we’re running a Bamboo CI agent which are building docker images and pushes them to our docker-registry. Several times each day we get failed builds caused by docker push getting connection reset by peer.

[info] time="2015-07-30T10:00:31+02:00" level=fatal msg="Error pushing to registry: Put https://docker-registry.**.**/v2/userhq/blobs/uploads/e60ca766-eb26-4adf-8cdb-fe7a127e3e4c?_state=a7vvRPLZLleaqCBw4xGmaxJZ-Z0Jc0SsFUUtcrnlqft7Ik5hbWUiOiJ1c2VyaHEiLCJVVUlEIjoiZTYwY2E3NjYtZWIyNi00YWRmLThjZGItZmU3YTEyN2UzZTRjIiwiT2Zmc2V0IjowLCJTdGFydGVkQXQiOiIyMDE1LTA3LTMwVDA4OjAwOjI2LjA0MzcwNzk2WiJ9&digest=sha256%3A1407c3b1319f21131f9da23c859ac406d2ae1051190611046c1666fc86dc5376: read tcp 1.2.3.4:443: connection reset by peer" 

Configuration and host information below, how can I debug this issue?

The docker registry is fronted by nginx configured as below:

user  nginx;
worker_processes  1;

events {
  worker_connections 1024;
  use epoll;
  multi_accept on;
}

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;

http {
    include               /etc/nginx/mime.types;
    default_type          application/octet-stream;

    log_format            main  '$remote_addr - $remote_user [$time_local] "$request" '
                            '$status $body_bytes_sent "$http_referer" '
                            '"$http_user_agent" "$http_x_forwarded_for"';

    access_log            /var/log/nginx/access.log  main;

    sendfile              on;
    keepalive_timeout     65;
    ssl_session_timeout   10m;

    ssl_certificate       /etc/nginx/ssl/server.crt;
    ssl_certificate_key   /etc/nginx/ssl/server.key;

    ssl_ciphers                 "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH";
    ssl_protocols               TLSv1 TLSv1.1 TLSv1.2;
    ssl_prefer_server_ciphers   on;
    ssl_session_cache           shared:SSL:10m;
    add_header                  Strict-Transport-Security "max-age=63072000; includeSubdomains; preload";
    add_header                  X-Frame-Options DENY;
    add_header                  X-Content-Type-Options nosniff;
    ssl_session_tickets         off;
    ssl_stapling                on; 
    ssl_stapling_verify         on;

    server {
        listen 443 ssl;
        server_name myregistrydomain.com;

        client_max_body_size 0;

        chunked_transfer_encoding on;

        client_body_buffer_size 100m;

        location /v2/ {
            # Do not allow connections from docker 1.5 and earlier
            # docker pre-1.6.0 did not properly set the user agent on ping, catch "Go *" user agents
            if ($http_user_agent ~ "^(docker\/1\.(3|4|5(?!\.[0-9]-dev))|Go ).*$" ) {
                return 404;
            }

            auth_basic "registry.localhost";
            auth_basic_user_file /etc/nginx/registry.htpasswd;
            add_header 'Docker-Distribution-Api-Version' 'registry/2.0' always;

            proxy_pass                          http://docker-registry:5000;
            proxy_set_header  Host              $http_host;   # required for docker client's sake
            proxy_set_header  X-Real-IP         $remote_addr; # pass on real client's IP
            proxy_set_header  X-Forwarded-For   $proxy_add_x_forwarded_for;
            proxy_set_header  X-Forwarded-Proto $scheme;
            proxy_read_timeout                  900;
        }
    }
}

Docker registry configuration:

version: 0.1
log:
  level: info
  formatter: text
  fields:
    service: registry
    environment: production
storage:
  filesystem:
    rootdirectory: /var/
  cache:
    layerinfo: redis
  maintenance:
    uploadpurging:
      enabled: true
      age: 72h
      interval: 8h
      dryrun: false
reporting:
  newrelic:
    licensekey: xxxxx 
    name: docker-registry
    verbose: false
http:
  addr: :5000
redis:
  addr: cache:6379
  db: 0
  dialtimeout: 10ms
  readtimeout: 10ms
  writetimeout: 10ms
  pool:
    maxidle: 16
    maxactive: 64
    idletimeout: 300s

docker compose is used to start it all:

cache: 
  image: redis 

nginx:
  build: ./nginx
  links: 
    - registry:docker-registry
  ports: 
    - 443:443

registry: 
  build: ./registry
  volumes: 
    - /data/docker:/var/docker
  links: 
    - cache

Docker registry host:

$ docker info
Containers: 3
Images: 73
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 79
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-172-31-24-143
ID: OTJI:CTOH:DLAP:JVAG:RFEP:4VWV:RR2U:SMJL:E5LU:PYOZ:CYMF:DRBA
WARNING: No swap limit support
ubuntu@ip-172-31-24-143:~$ docker info
Containers: 3
Images: 73
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 79
 Dirperm1 Supported: false
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 3.13.0-58-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 1
Total Memory: 3.676 GiB
Name: ip-172-31-24-143
ID: OTJI:CTOH:DLAP:JVAG:RFEP:4VWV:RR2U:SMJL:E5LU:PYOZ:CYMF:DRBA
WARNING: No swap limit support

$ docker version
Client version: 1.7.1
Client API version: 1.19
Go version (client): go1.4.2
Git commit (client): 786b29d
OS/Arch (client): linux/amd64
Server version: 1.7.1
Server API version: 1.19
Go version (server): go1.4.2
Git commit (server): 786b29d
OS/Arch (server): linux/amd64

Bamboo CI host:

$ docker info
Containers: 0
Images: 106
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 106
 Dirperm1 Supported: false
Execution Driver: native-0.2
Kernel Version: 3.13.0-52-generic
Operating System: Ubuntu 14.04.2 LTS
CPUs: 8
Total Memory: 14.69 GiB
Name: ip-172-31-9-199
ID: ZTUG:L4YS:6JUT:KLH4:4PPE:5CGO:GBHD:4UI4:IA6F:OI3B:T5GY:7DX2
WARNING: No swap limit support

$ docker version
Client version: 1.6.2
Client API version: 1.18
Go version (client): go1.4.2
Git commit (client): 7c8fca2
OS/Arch (client): linux/amd64
Server version: 1.6.2
Server API version: 1.18
Go version (server): go1.4.2
Git commit (server): 7c8fca2
OS/Arch (server): linux/amd64

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 81 (16 by maintainers)

Most upvoted comments

I hit the issue when I did a docker push on an EC2 instance to a registry running on the same instance, using the external-facing IP address. Using the internal IP did not trigger the problem. A large upload would generally fail after about 8 seconds. It was very easy to reproduce.

I collected a tcpdump to see what was happening. At the moment the upload failed, the EC2 instance was receiving a packet that was very far out of sequence. Its sequence number and timestamp were several seconds behind the actual TCP stream. Interestingly, this did not seem to be a retransmit of a previously-sent packet. Presumably this packet is generated within AWS’ infrastructure.

Normally a packet like this should be treated as a spurious retransmit and ignored, but for some reason it was causing the local host to generate a RST packet and kill the connection. Given the anecdote in https://github.com/docker/distribution/issues/785#issuecomment-183338454 that running the registry container in host networking mode works around the issue, I suspected this had something to do with how Docker bridge networking works.

When operating in bridged mode, Docker creates some iptables rules to perform NAT between the exposed address/port and the container’s internal address/port. I had a look at Linux’s NAT implementation, which builds on top of nf_conntrack for connection tracking. nf_conntrack has a state machine that tracks connection state. If nf_conntrack believes its state is out of sync with the actual connection, it treats incoming packets as invalid. One of the checks is the tcp_window function, which rejects packets outside the TCP window. I believe this is the check that is failing.

nf_conntrack has a “be liberal” flag that accepts these packets as valid. Sure enough, after running:

echo 1 > /proc/sys/net/ipv4/netfilter/ip_conntrack_tcp_be_liberal

…I haven’t been able to trigger the issue anymore.

If this is indeed a successful workaround, should we consider having Docker Engine switch on that flag by default?

cc @tonyhb @dmp42 @stevvooe @mrjana

Filed https://github.com/docker/libnetwork/issues/1090. Will also reach out to AWS with our conclusions.

@mrjana: I think I get it now. When conntrack treats a packet like this as “invalid”, it doesn’t associate it with the flow that its tuple corresponds to. Thus, the packet doesn’t get rewritten by the NAT rule, and ends up being handled as if it was part of a connection to the host’s actual IP address. The host sees that it doesn’t have a matching flow, and (correctly) sends a RST packet.

I found I can also work around this by adding a rule to the INPUT chain that drops invalid packets:

iptables -I INPUT -m conntrack --ctstate INVALID -j DROP

This prevents the packet from being interpreted as destined to the pre-NAT IP address, and prevents the RST from being generated.