carbon-c-relay: carbon-c-relay won't start if one of destinations in any_of in maintenance/redeploy

I’m not sure if this is designed or just a bug but consider this scenario

  1. We have a config file for carbon-c-relay with any_of cluster of nodes (currently 6 nodes)
  2. This config is rather static as only solution I found to make the cluster dynamic is using DNS round robin or DNS of service discovery and have just one record in any_of cluster
  3. At the same time we are doing maintenance/redeploy of one of the cluster nodes
  4. Client is being deployed and carbon-c-relay is issued start
  5. carbon-c-relay fails on the new client due to not all nodes in cluster having been up or having resolvable DNS with this message:
[2017-07-31 11:06:30] (MSG) starting carbon-c-relay v3.1 (2017-07-31), pid=27549
configuration:
    relay hostname = example-client1
    workers = 4
    send batch size = 2500
    server queue size = 25000
    server max stalls = 4
    listen backlog = 32
    server connection IO timeout = 600ms
    extra allowed characters = -_:#^
    configuration = /etc/carbon-c-relay.conf

[2017-07-31 11:06:30] (ERR) failed to read configuration '/etc/carbon-c-relay.conf'

Is this designed behaviour? Because it looks there is already a code to detect these conditions when service is up but it would be nice to have the same during startup. We have tried with version 3.1 and the latest master at the time of writing

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 15 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, so it looks like it’s relatively easy to add support for ignoring non-resolving addresses, and treating those as failed servers. Regarding the keeping the addresses, I’m leaning towards removing that, such that removing a dns entry could shut down the metric flow. I’m going to think about it a bit longer.