kubernetes: Fluentd-scaler causing fluentd pod deletions and messes with ds-controller
Forking from https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-373121164:
To summarize, here’s what we observed:
- PATCH daemonset calls coming every minute from both fluentd-scaler and addon-manager (verified this by individually turning them on/off). Things we need to understand here:
- Is the daemonset object continuously toggling b/w 2 states? (we know here that it’s RV is increasing continuously)
- If yes, what field(s) in the object are changing? IIRC value of some label/annotation (I think ‘UpdatedPodsScheduled’) is changing (probably related to 2 below)
- Also, why/should the fluentd-scaler send any api request if the resources are already set to the right value?
- Fluentd pods are getting deleted and recreated by daemonset-controller when the scaler is enabled (as was also seen in https://github.com/kubernetes/kubernetes/issues/60500#issuecomment-373001797). Why this is happening? One thing to note here - all those delete calls are preceded by
PUT pod-statuscalls from respective kubelets (but maybe that’s expected).
cc @kubernetes/sig-instrumentation-bugs @crassirostris @liggitt /priority critical-urgent /assign @x13n
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 37 (36 by maintainers)
Commits related to this issue
- Bump fluentd-gcp-scaler version Fixes #61190. This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`. — committed to x13n/kubernetes by x13n 6 years ago
- Merge pull request #61225 from x13n/fluentd-gcp-scaler Automatic merge from submit-queue (batch tested with PRs 60888, 61225). If you want to cherry-pick this change to another branch, please follow ... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Merge pull request #61472 from shyamjvs/disable-fluentd-scaler Automatic merge from submit-queue. If you want to cherry-pick this change to another branch, please follow the instructions <a href="htt... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Merge pull request #61714 from shyamjvs/revert-fluentd-rolling-upgrade-change Automatic merge from submit-queue (batch tested with PRs 60519, 61099, 61218, 61166, 61714). If you want to cherry-pick t... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Merge pull request #61715 from shyamjvs/increase-density-cm-threshold Automatic merge from submit-queue (batch tested with PRs 60499, 61715, 61688, 61300, 58787). If you want to cherry-pick this chan... — committed to kubernetes/kubernetes by deleted user 6 years ago
- Bump fluentd-gcp-scaler version Fixes #61190. This version verifies on its own whether resources should be updated or not, instead of relying on `kubectl set resources`. — committed to prameshj/kubernetes by x13n 6 years ago
I spoke offline with @x13n and suggested that we should increase maxUnavailable for the fluentd daemonset to a large enough value so that we’re not bottlenecked by it. My reasoning is:
I’m going to make that change and test it against my PR (thanks @x13n for pointing that we can change maxUnavailable directly in ds config).
@wojtek-t @jdumars Feel free to override me if you have a good reason 😃