containerpilot: exec stdout w/o newlines breaks health checks by logging on closed pipe

Recently, I updated the containerpilot version from 2.7.3 to 2.7.4 and 2.7.5 and I got the error as described below.

2017/07/03 11:28:35 us-east-1b-log-zookeeper.health exited with error: io: read/write on closed pipe

Actually, Zookeeper health was fine and health check script exited with code 0. But Zookeeper health on consul was out-of-service. When starting the service, it looked fine, but the problem occurred in a few days. I looked into containerpilot code to figure out what caused it. The error came out from here and it might have been caused by closing the logger in here. I couldn’t figure out why the logger was closed and who did it. When I sent SIGHUP to a containerpilot process, the logger pipe was recreated and the service health was changed to in-service.

Anyway, in this situation, I think service health should be in-service, even if the logger is closed. I’ve attached the some scripts below to help you debug this issue.

health check script

#!/bin/bash
echo ruok | nc ${HOSTNAME} ${ZOOKEEPER_PORT}

containerpilot configuration

{
  "consul": "localhost:8500",
  "preStart": "/zookeeper/{{.ZOOKEEPER_NAME}}/config/preStart.sh",
  "logging": {
    "level": "INFO",
    "format": "default",
    "output": "stdout"
  },
  "stopTimeout": 5,
  "preStop": "/zookeeper/{{.ZOOKEEPER_NAME}}/config/preStop.sh",
  "postStop": "/zookeeper/{{.ZOOKEEPER_NAME}}/config/postStop.sh",
  "services": [
    {
      "id": "{{.DC}}-{{.ZOOKEEPER_CONSUL_ID}}-{{.HOSTNAME}}",
      "name": "{{.DC}}-{{.ZOOKEEPER_CONSUL_NAME}}",
      "port": {{.ZOOKEEPER_PORT}},
      "health": "/zookeeper/{{.ZOOKEEPER_NAME}}/config/healthCheck.sh",
      "tags": [
          {{.ZOOKEEPER_TAGS}}
      ],
      "interfaces": [
          {{.CONTAINER_PILOT_INTERFACES}}
      ],
      "poll": 10,
      "timeout": "10s",
      "ttl": 30
    }
  ],
  "backends": [
  ],
  "tasks": [
  ],
  "coprocesses": [
    {
      "name": "setExporter",
      "command": ["/zookeeper/{{.ZOOKEEPER_NAME}}/config/setExporter.sh"],
      "restarts": "never"
    }
  ]
}

About this issue

Original URL
State: open
Created 7 years ago
Comments: 16 (11 by maintainers)

Commits related to this issue

Exclude containerpilot cruft Helping to mitigate https://github.com/joyent/containerpilot/issues/423 — committed to levieindustries/elementary by larslevie 7 years ago

Most upvoted comments

I think I figured out what caused it. The output string of the health check script did not contain a new line character at the end of the string, so the output might be accumulated on the buffer.

The defualt value of bufio.MaxScanTokenSize is set to 64KB. In case of my situation, the length of the output string was 7 bytes and the health check interval was 10 sec. And the service was in-service for 26 hours. I calculated the total length of the output during 26 hours like this: ((26*3600)/10)*7 = 65520. It is very close to 64KB.

I added a new line character to the health check script to verify my hypothesis and I’ve been keeping my eyes on it

ochanism on Jul 5, 2017

PR open for v3 https://github.com/joyent/containerpilot/pull/424 PR open for v2 https://github.com/joyent/containerpilot/pull/425

tgross on Jul 3, 2017