vector: Memory leak with kafka source, http sink and `429 code` responses
A note for the community
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Problem
We have a configuration with a kafka source and an http sink, and with acknowledgements enabled. And it works fine when the http sink only receives successful responses, but when the sink receives 429 responses, the memory starts growing until the vector is killed by the OOM.
Also I tried turning on allocation-tracing and the graph looks like there is a memory leak in the kafka source.
Some graphs:
Memory usage from vector node

Internal allocation tracing, all components:

Internal allocation tracing, only kafka source:

Rate of http sink requests with response code:

Configuration
data_dir: /tmp/vector-data-dir
api:
enabled: true
address: "127.0.0.1:8686"
playground: true
log_schema:
host_key: host
message_key: message
source_type_key: source_type
timestamp_key: timestamp
sources:
main_input_kafka_src:
type: kafka
bootstrap_servers: "bootstrap.brokers.kafka:443"
group_id: vector_test_local
auto_offset_reset: earliest
topics:
- test-topic
librdkafka_options:
fetch.message.max.bytes: "10485760" # 10 MB
fetch.max.bytes: "104857600" # 100 MB
sinks:
test_sink:
type: "http"
inputs:
- main_input_kafka_src
uri: "http://localhost:8000"
method: "post"
acknowledgements:
enabled: true
buffer:
type: "memory"
max_events: 100000
when_full: "block"
batch:
max_bytes: 20971520
timeout_secs: 10
encoding:
codec: "json"
request:
concurrency: 250
Version
0.29.1
Debug Output
2023-05-02T05:33:20.934810Z WARN sink{component_kind="sink" component_id=test_sink component_type=http component_name=test_sink}:request{request_id=817}: vector::sinks::util::retries: Retrying after response. reason=too many requests internal_log_rate_limit=true
2023-05-02T05:33:21.419138Z WARN sink{component_kind="sink" component_id=test_sink component_type=http component_name=test_sink}:request{request_id=636}: vector::sinks::util::retries: Internal log [Retrying after response.] is being rate limited.
2023-05-02T05:33:31.622858Z WARN sink{component_kind="sink" component_id=test_sink component_type=http component_name=test_sink}:request{request_id=284}: vector::sinks::util::retries: Internal log [Retrying after response.] has been rate limited 33 times.
2023-05-02T05:33:31.622918Z WARN sink{component_kind="sink" component_id=test_sink component_type=http component_name=test_sink}:request{request_id=284}: vector::sinks::util::retries: Retrying after response. reason=too many requests internal_log_rate_limit=true
Example Data
No response
Additional Context
For MRE: I created an http server for http sink with simple logic: when concurrent requests < 20 then sleeps for 40 seconds and returns success, otherwise returns 429 immediately. The content of the kafka topic is not important, just need enough amount. Vector running in k8s.
References
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 24 (20 by maintainers)
Hi @Ilmarii,
I believe this should be resolved in the latest version (v0.34.0). We fixed a memory leak in the http sink (https://github.com/vectordotdev/vector/pull/18637) that would be triggered when the downstream service returns 429 and also refactored the Kafka source to better handle acknowledgements (https://github.com/vectordotdev/vector/pull/17497). Let us know if you still experience issues after upgrading.
Well, with acknowledges disabled there is no memory leak 😃
@spencergilbert I built and tested vector from
spencer/improve-selectsand the memory leak still exists.Fairly certain I’ve
solved thisimproved this - confirming locally, and should have a PR to close it later today if testing goes well.@Ilmarii would you be able to run a nightly version to see if my changes help enough in your environment