opentelemetry-collector-contrib: New component: Failover Connector

The purpose and use-cases of the new component

A connector that routes data based on the current health status of a downstream component, typically an exporter.

I have heard several users ask for the ability to send data to a backup exporter, if a primary exporter fails. I believe this could be implemented as a routing connector.

The user would specify at least one pipeline to which data would typically be routed. Additionally, the user must specify at least one backup pipeline or pipelines which would be used when an error is encountered.

Initially, I think the trigger for routing to a backup pipeline could be based on backpropogated errors, though this is not yet very robust (See https://github.com/open-telemetry/opentelemetry-collector/issues/7460). At a later time, I imagine this could be based on the health status of an exporter (See https://github.com/open-telemetry/opentelemetry-collector/issues/6344).

Example configuration for the component

receivers:
  foo:

exporters:
  bar/main:
  bar/backup:

connectors:
  failover:
    primary: logs/main
    secondary: logs/backup

service:
  pipelines:
    logs/in:
       receivers: [foo]
       exporters: [failover]
    logs/main:
      receivers: [failover]
      exporters: [bar/main]
    logs/backup:
      receivers: [failover]
      exporters: [bar/backup]

Telemetry data types supported

traces->traces metrics->metrics logs->logs

Is this a vendor-specific component?

  • This is a vendor-specific component
  • If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.

Sponsor (optional)

No response

Additional context

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 5
  • Comments: 15 (12 by maintainers)

Commits related to this issue

Most upvoted comments

^ I was able to begin looking into this recently and will open a first pass PR for this shortly.

@djaglowski sounds good, can I please be assigned this issue.

That’s exciting @akats7. We’ve been maintaining an internal fork of resiliency features added to the Splunk HEC Exporter and would love to get these features somewhere into the mainline collector. Your PR for a Connector will be great to see and hopefully help with. Cheers!

@cparkins, when there are multiple pipelines at the same priority level, it would fan out data to those pipelines.

@sethallen, I like the idea of allowing a priority list, but I think we should leave room for other parameters as well. I also think we need to allow multiple pipelines per “level”.

connectors:
  failover: 
    priority:
      - [logs/main]
      - [logs/backup, logs/backup2]
      - [logs/backup/3]
    min_failover_interval: 2m # Possibly would add this in future

I’d like to sponsor this.

I’m glad you made this @djaglowski! I was just chatting with @atoulme about adding failover and circuit breaker support for exporters a couple days ago. The connector seems like a great method to add broad failover support.

How about tweaking this slightly to support 1…N entries as a yaml flow sequence? It would reduce complexity in the failover connector by removing the need for keys (primary, secondary, etc.) in order to choose the next pipeline to failover to.

Example:

receivers:
  foo:

exporters:
  bar/main:
  bar/backup:
  bar/backup2:

connectors:
  failover: [logs/main, logs/backup, logs/backup2, .. n]
#    primary: logs/main
#    secondary: logs/backup

service:
  pipelines:
    logs/in:
       receivers: [foo]
       exporters: [failover]
    logs/main:
      receivers: [failover]
      exporters: [bar/main]
    logs/backup:
      receivers: [failover]
      exporters: [bar/backup]
    logs/backup2:
      receivers: [failover]
      exporters: [bar/backup2]