caddy: Invalid memory address on failing upstream servers

1. Which version of Caddy are you using (caddy -version)?

0.11.4

2. What are you trying to do?

We have a setup with a number of proxy elements. Due to an upgrade of upstream services all upstream servers where failing with 502s

3. What is your Caddyfile?

cloud.humio.com, customer1.humio.com {
  errors {
    502 /var/www/errors/502.html
  }

  tls {
    dns route53
  }
  log / /var/log/caddy/cloud.humio.com.http.log "{remote} - {user} [{when}] \"{method} {uri} {proto}\" {status} {size} \"{>Referer}\" \"{>User-Agent}\" {latency_ms} {>Content-Length} \"{upstream}\" \"{<Server}\" \"{>Humio-Query-Session}\"" {
    rotate_size 100
    rotate_age  7
    rotate_keep 100
  }
  limits {
    body / 32MB
  }

  header / {
    Strict-Transport-Security "max-age=63072000; includeSubDomains; preload"
    X-XSS-Protection "1;mode=block"
    X-Content-Type-Options "nosniff"
  }

  proxy /api/v1/ingest 10.0.2.1:8080 10.0.2.2:8080 10.0.2.3:8080 10.0.2.4:8080 10.0.2.5:8080 10.0.2.6:8080 10.0.2.7:8080 10.0.2.8:8080 {
    policy least_conn
    health_check /api/v1/status
    transparent
  }
  proxy /api/v1/dataspaces/customer2/ingest 10.0.2.1:8080 10.0.2.2:8080 10.0.2.3:8080 10.0.2.4:8080 10.0.2.5:8080 10.0.2.6:8080 10.0.2.7:8080 10.0.2.8:8080 {
    policy least_conn
    health_check /api/v1/status
    transparent
  }
  proxy /api/v1/dataspaces/customer3/ingest 10.0.2.1:8080 10.0.2.2:8080 10.0.2.3:8080 10.0.2.4:8080 10.0.2.5:8080 10.0.2.6:8080 10.0.2.7:8080 10.0.2.8:8080 {
    policy least_conn
    health_check /api/v1/status
    transparent
  }
  proxy / 10.0.2.1:8080 10.0.2.2:8080 10.0.2.3:8080 10.0.2.4:8080 10.0.2.5:8080 10.0.2.6:8080 10.0.2.7:8080 10.0.2.8:8080 {
    policy header Humio-Query-Session
    max_conns 128
    health_check /api/v1/status
    transparent
  }
}

4. How did you run Caddy (give the full command and describe the execution environment)?

We run with

# /etc/systemd/system/caddy.service
[Unit]
Description=Caddy HTTP/2 web server
Documentation=https://caddyserver.com/docs
After=network-online.target
Wants=network-online.target systemd-networkd-wait-online.service

[Service]
Restart=on-abnormal

; User and group the process will run as.
User=www-data
Group=www-data

; Letsencrypt-issued certificates will be written to this directory.
Environment=CADDYPATH=/etc/ssl/caddy

; Always set "-root" to something safe in case it gets forgotten in the Caddyfile.
WorkingDirectory=/var/tmp
ExecStart=/usr/local/bin/caddy -log stdout -agree=true -conf=/etc/caddy/Caddyfile -root=/var/tmp -email=ops@humio.com
ExecReload=/bin/kill -USR1 $MAINPID

; Use graceful shutdown with a reasonable timeout
KillMode=mixed
KillSignal=SIGQUIT
TimeoutStopSec=5s

; Limit the number of file descriptors; see `man systemd.exec` for more limit settings.
LimitNOFILE=1048576
; Unmodified caddy is not expected to use more than that.
LimitNPROC=512

; Use private /tmp and /var/tmp, which are discarded after caddy stops.
PrivateTmp=true
; Use a minimal /dev (May bring additional security if switched to 'true', but it may not work on Raspberry Pi's or other devices, so it has been disabled in this dist.)
PrivateDevices=false
; Hide /home, /root, and /run/user. Nobody will steal your SSH-keys.
ProtectHome=true
; Make /usr, /boot, /etc and possibly some more folders read-only.
ProtectSystem=full
; … except /etc/ssl/caddy, because we want Letsencrypt-certificates there.
;   This merely retains r/w access rights, it does not add any new. Must still be writable on the host!
ReadWriteDirectories=/etc/ssl/caddy /var/log/caddy

; The following additional security directives only work with systemd v229 or later.
; They further restrict privileges that can be gained by caddy. Uncomment if you like.
; Note that you may have to add capabilities required by any plugins in use.
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
AmbientCapabilities=CAP_NET_BIND_SERVICE
NoNewPrivileges=true

[Install]
WantedBy=multi-user.target

# /etc/systemd/system/caddy.service.d/aws.conf
[Service]
Environment=AWS_ACCESS_KEY_ID=…
Environment=AWS_SECRET_ACCESS_KEY=…

5. Please paste any relevant HTTP request(s) here.

6. What did you expect to see?

Caddy shouldn’t crash and should not terminlate with status code 2, as that’s an INVALIDARGUMENT according to Systemd and won’t trigger a restart when on-abnornal

Mar 11 11:36:21 webfront01 systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT

7. What did you see instead (give full error messages and/or log)?

Log messages before Caddy crashed

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x47817e]
goroutine 12474847930 [running]:
bufio.(*Writer).Available(...)
        /usr/local/go/src/bufio/bufio.go:592
bufio.(*Writer).WriteString(0x0, 0xe4372f, 0x19, 0xc01c385c20, 0x509430, 0x6b5c3f)
        /usr/local/go/src/bufio/bufio.go:673 +0x7e
net/http.(*expectContinueReader).Read(0xc00c90e600, 0xc005b67000, 0x812, 0x1000, 0x0, 0xffffffffffffffff, 0x0)
        /usr/local/go/src/net/http/server.go:889 +0x166
io.(*teeReader).Read(0xc00c90e780, 0xc005b67000, 0x812, 0x1000, 0x2, 0x1, 0xc00e6e5b70)
        /usr/local/go/src/io/io.go:535 +0x55
github.com/mholt/caddy/caddyhttp/limits.(*maxBytesReader).Read(0xc007866780, 0xc005b67000, 0x812, 0x1000, 0x408bdb, 0xc00001c000, 0xd3f5c0)
        /tmp/gopath_02-22-0906.228675989/src/github.com/mholt/caddy/caddyhttp/limits/handler.go:74 +0x89
net/http.transferBodyReader.Read(0xc009007360, 0xc005b67000, 0x812, 0x1000, 0xd69220, 0xc00e6e5c01, 0x0)
        /usr/local/go/src/net/http/transfer.go:62 +0x56
io.(*LimitedReader).Read(0xc00f28d940, 0xc005b67000, 0x1000, 0x1000, 0x0, 0x0, 0x5)
        /usr/local/go/src/io/io.go:448 +0x63
bufio.(*Writer).ReadFrom(0xc007867340, 0xf622e0, 0xc00f28d940, 0x7fb4b31ba140, 0xc007867340, 0xc000044f01)
        /usr/local/go/src/bufio/bufio.go:707 +0xe4
io.copyBuffer(0xf61420, 0xc007867340, 0xf622e0, 0xc00f28d940, 0x0, 0x0, 0x0, 0xd76aa0, 0x1, 0xc00f28d940)
        /usr/local/go/src/io/io.go:388 +0x303
io.Copy(0xf61420, 0xc007867340, 0xf622e0, 0xc00f28d940, 0xc0003b0f00, 0x0, 0x1)
        /usr/local/go/src/io/io.go:364 +0x5a
net/http.(*transferWriter).writeBody(0xc009007360, 0xf61420, 0xc007867340, 0x2, 0x2)
        /usr/local/go/src/net/http/transfer.go:362 +0x5b8
net/http.(*Request).write(0xc00d5ce700, 0xf61420, 0xc007867340, 0x0, 0xc0053fa270, 0xc00c90f8a0, 0x0, 0x0)
        /usr/local/go/src/net/http/request.go:645 +0x6e8
net/http.(*persistConn).writeLoop(0xc006265440)
        /usr/local/go/src/net/http/transport.go:1888 +0x1b8
created by net/http.(*Transport).dialConn
        /usr/local/go/src/net/http/transport.go:1339 +0x966
Mar 11 11:36:21 webfront01 systemd[1]: caddy.service: Main process exited, code=exited, status=2/INVALIDARGUMENT
Mar 11 11:36:21 webfront01 systemd[1]: caddy.service: Failed with result 'exit-code'.

8. Why is this a bug, and how do you think this should be fixed?

Caddy had been operating normally for weeks and should most definitely not crash because an upstream service isn’t responding.

9. What are you doing to work around the problem in the meantime?

Hope…

10. Please link to any related issues, pull requests, and/or discussion.

Bonus: What do you use Caddy for? Why did you choose Caddy?

We use it for https://cloud.humio.com/ because SSL termination is so much easier. Compare Caddy to a competing product.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Can you enable the error log as well? Use the errors directive to specify a log filename.

From what I can see, the errors are already printed to stderr, so I don’t know how I can enable them even more?

Also, can you clarify this, it seems like you didn’t finish the sentence about what you are trying to do:

Yes, so we were upgrading the software on all upstream servers, which naturally resulted in timeouts etc. I.e. these were printed in the logs, just before the “panic”

Mar 11 16:28:38 webfront01 caddy[13757]: 11/Mar/2019:16:28:38 +0100 [ERROR 502 /api/v1/dataspaces/customerA/ingest/elasticsearch/_bulk] read tcp 10.0.2.254:36072->10.0.2.7:8080: read: connection reset by peer

We don’t know for sure if there’s a connection between the connection resets and Caddy crashing. But it has happened twice with less than a second between connection reset and Caddy crashing.