telegraf: telegraf does not reconnect to influxdb

Relevant telegraf.conf:

[global_tags]
  fqdn = "zwei.k3.example.org"
  datacenter = "k3"
  environment = "production"
  engine = "zwei"
  role = "node"
# Configuration for telegraf agent
[agent]
      logfile  = "/var/log/telegraf/telegraf.log"
      interval  = "1m"
    round_interval = true
  debug = false
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "5s"
  precision = ""
  quiet = true
  hostname = ""
  omit_hostname = false                                                                                                                      
  [[outputs.discard]]                                                                                                           
  [[outputs.influxdb]]                                                                                                                                             
        username  = "writetoinflux"                                                                                            
        password  = "no-authorization-needed"                                                                    
        urls  = ["https://kubernetes.influxdb.example.org"]                                                                                                
        timeout  = "30s"                                                                                                                                                                                                                              
        database  = "monitoring"                                                                                                                                                                                  
[[inputs.cpu]]
  ## Whether to report per-cpu stats or not
  percpu = true
  ## Whether to report total system cpu stats or not
  totalcpu = true
  ## If true, collect raw CPU time metrics.
  collect_cpu_time = false
# Read metrics about disk usage by mount point
[[inputs.disk]]
  ## By default, telegraf gather stats for all mountpoints.
  ## Setting mountpoints will restrict the stats to the specified mountpoints.
  # mount_points = ["/"]
  ## Ignore some mountpoints by filesystem type. For example (dev)tmpfs (usually
  ## present on /run, /var/run, /dev/shm or /dev).
  ignore_fs = ["tmpfs", "devtmpfs", "devfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
  [[inputs.docker]]
     endpoint = "unix:///var/run/docker.sock"
     timeout = "10s"
     total = true
  [[inputs.ping]]
    urls = ["vip"]
    count = 1

System info:

 telegraf --version
Telegraf 1.10.2 (git: HEAD 3303f5c3)
# cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"

Steps to reproduce:

  1. run influxdb behind a ssl proxy (traefik ingress controller in my case)
  2. stop influxdb for 10min
  3. start inflluxdb again

Expected behavior:

telegraf does reconnect

Actual behavior:

telegraf is not able to reconnect

Additional info:

influxdb ist running as a statefulSet in kubernetes stopping influxdb is realized by set the replica to 0 starting influxdb is realized by set the replica to 1 - which starts a new POD

telegraf.log:

2019-05-24T08:10:11Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 503 Service Unavailable
2019-05-24T08:10:11Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:23Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 503 Service Unavailable
2019-05-24T08:10:23Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:34Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 502 Bad Gateway
2019-05-24T08:10:34Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:10:44Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: 502 Bad Gateway
2019-05-24T08:10:44Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:11:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:11:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:11:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:11:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:12:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:12:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:12:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:12:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:13:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:13:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:13:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:13:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:14:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:14:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:14:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:14:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:15:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:15:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:15:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:15:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:16:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:16:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:16:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:16:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:17:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:17:21Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:17:51Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-24T08:17:51Z E! [agent] Error writing to output [influxdb]: could not write any address
2019-05-24T08:18:21Z E! [outputs.influxdb] when writing to [https://kubernetes.influxdb.example.org]: Post https://kubernetes.influxdb.example.org/write?db=monitoring: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

influxdb is reachable and curl can write:

root@zwei.k3.example.org:~# curl -i -XPOST 'https://kubernetes.influxdb.example.org/write?db=monitoring' --data-binary 'event,title="test",tags="test",text="test" value=0' 
HTTP/1.1 204 No Content
Content-Type: application/json
Date: Fri, 24 May 2019 08:26:26 GMT
Request-Id: 9f273246-7dfd-11e9-8678-0a580af402a4
X-Influxdb-Build: OSS
X-Influxdb-Version: 1.7.4
X-Request-Id: 9f273246-7dfd-11e9-8678-0a580af402a4

a telegraf restart (although taking a long time) does fix the problem

root@zwei.k3.example.org:~# time systemctl restart telegraf

real    0m30.719s
user    0m0.003s
sys     0m0.002s

After that, telegraf does write to influxdb again.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 4
  • Comments: 45 (41 by maintainers)

Most upvoted comments

Can you try setting this environment variable which should disable HTTP/2 support?

GODEBUG=http2client=0