opentelemetry-collector-contrib: New component: Failover Connector
The purpose and use-cases of the new component
A connector that routes data based on the current health status of a downstream component, typically an exporter.
I have heard several users ask for the ability to send data to a backup exporter, if a primary exporter fails. I believe this could be implemented as a routing connector.
The user would specify at least one pipeline to which data would typically be routed. Additionally, the user must specify at least one backup pipeline or pipelines which would be used when an error is encountered.
Initially, I think the trigger for routing to a backup pipeline could be based on backpropogated errors, though this is not yet very robust (See https://github.com/open-telemetry/opentelemetry-collector/issues/7460). At a later time, I imagine this could be based on the health status of an exporter (See https://github.com/open-telemetry/opentelemetry-collector/issues/6344).
Example configuration for the component
receivers:
foo:
exporters:
bar/main:
bar/backup:
connectors:
failover:
primary: logs/main
secondary: logs/backup
service:
pipelines:
logs/in:
receivers: [foo]
exporters: [failover]
logs/main:
receivers: [failover]
exporters: [bar/main]
logs/backup:
receivers: [failover]
exporters: [bar/backup]
Telemetry data types supported
traces->traces metrics->metrics logs->logs
Is this a vendor-specific component?
- This is a vendor-specific component
- If this is a vendor-specific component, I am proposing to contribute this as a representative of the vendor.
Sponsor (optional)
No response
Additional context
No response
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 5
- Comments: 15 (12 by maintainers)
Commits related to this issue
- First PR - Failover Connector skeleton (#28818) This is the Part 1 PR for the Failover Connector (split according to the CONTRIBUTING.md doc) Link to tracking Issue: #20766 Testing: Added fac... — committed to open-telemetry/opentelemetry-collector-contrib by akats7 8 months ago
- First PR - Failover Connector skeleton (#28818) This is the Part 1 PR for the Failover Connector (split according to the CONTRIBUTING.md doc) Link to tracking Issue: #20766 Testing: Added fac... — committed to ClickHouse/opentelemetry-collector-contrib by akats7 8 months ago
- Failover Connector PR2 - core failover functionality (#29557) This is the 2nd PR for the failover connector that implements the core failover functionality. It is currently in place for Traces and o... — committed to open-telemetry/opentelemetry-collector-contrib by akats7 7 months ago
^ I was able to begin looking into this recently and will open a first pass PR for this shortly.
@djaglowski sounds good, can I please be assigned this issue.
That’s exciting @akats7. We’ve been maintaining an internal fork of resiliency features added to the Splunk HEC Exporter and would love to get these features somewhere into the mainline collector. Your PR for a Connector will be great to see and hopefully help with. Cheers!
@cparkins, when there are multiple pipelines at the same priority level, it would fan out data to those pipelines.
@sethallen, I like the idea of allowing a priority list, but I think we should leave room for other parameters as well. I also think we need to allow multiple pipelines per “level”.
I’d like to sponsor this.
I’m glad you made this @djaglowski! I was just chatting with @atoulme about adding failover and circuit breaker support for exporters a couple days ago. The connector seems like a great method to add broad failover support.
How about tweaking this slightly to support 1…N entries as a yaml flow sequence? It would reduce complexity in the failover connector by removing the need for keys (primary, secondary, etc.) in order to choose the next pipeline to failover to.
Example: