telegraf: telegraf does not reconnect to influxdb
Relevant telegraf.conf:
[global_tags]
fqdn = "zwei.k3.example.org"
datacenter = "k3"
environment = "production"
engine = "zwei"
role = "node"
# Configuration for telegraf agent
[agent]
logfile = "/var/log/telegraf/telegraf.log"
interval = "1m"
round_interval = true
debug = false
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "5s"
precision = ""
quiet = true
hostname = ""
omit_hostname = false
[[outputs.discard]]
[[outputs.influxdb]]
username = "writetoinflux"
password = "no-authorization-needed"
urls = ["https://kubernetes.influxdb.example.org"]
timeout = "30s"
database = "monitoring"
[[inputs.cpu]]
## Whether to report per-cpu stats or not
percpu = true
## Whether to report total system cpu stats or not
totalcpu = true
## If true, collect raw CPU time metrics.
collect_cpu_time = false
# Read metrics about disk usage by mount point
[[inputs.disk]]
## By default, telegraf gather stats for all mountpoints.
## Setting mountpoints will restrict the stats to the specified mountpoints.
# mount_points = ["/"]
## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
## present on /run, /var/run, /dev/shm or /dev).
ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.docker]]
endpoint = "unix:///var/run/docker.sock"
timeout = "10s"
total = true
[[inputs.ping]]
urls = ["vip"]
count = 1
System info:
telegraf --version
Telegraf 1.10.2 (git: HEAD 3303f5c3)
# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
Steps to reproduce:
- run influxdb behind a ssl proxy (traefik ingress controller in my case)
- stop influxdb for 10min
- start inflluxdb again
Expected behavior:
telegraf does reconnect
Actual behavior:
telegraf is not able to reconnect
Additional info:
influxdb ist running as a statefulSet in kubernetes stopping influxdb is realized by set the replica to 0 starting influxdb is realized by set the replica to 1 - which starts a new POD
telegraf.log:
2019-05-24T08:10:11Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 503 Service Unavailable
2019-05-24T08:10:11Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:23Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 503 Service Unavailable
2019-05-24T08:10:23Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:34Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 502 Bad Gateway
2019-05-24T08:10:34Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:44Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 502 Bad Gateway
2019-05-24T08:10:44Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:11:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:11:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:11:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:11:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:12:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:12:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:12:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:12:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:13:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:13:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:13:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:13:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:14:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:14:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:14:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:14:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:15:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:15:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:15:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:15:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:16:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:16:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:16:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:16:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:17:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:17:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:17:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:17:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:18:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
influxdb is reachable and curl can write:
root@zwei.k3.example.org:~# curl -i -XPOST 'https://kubernetes.influxdb.example.org/write?db=monitoring' --data-binary 'event,title="test",tags="test",text="test" value=0'
HTTP/1.1 204 No Content
Content-Type: application/json
Date: Fri, 24 May 2019 08:26:26 GMT
Request-Id: 9f273246-7dfd-11e9-8678-0a580af402a4
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.7.4
X-Request-Id: 9f273246-7dfd-11e9-8678-0a580af402a4
a telegraf restart (although taking a long time) does fix the problem
root@zwei.k3.example.org:~# time systemctl restart telegraf
real 0m30.719s
user 0m0.003s
sys 0m0.002s
After that, telegraf does write to influxdb again.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 4
- Comments: 45 (41 by maintainers)
Can you try setting this environment variable which should disable HTTP/2 support?