node-problem-detector: Add option flag to taint node for permanent problems
Currently, when a permanent problem is triggered, node problem detector(NPD) adjusts the corresponding node condition status and triggers an event. This can be seen as a limitation when using NPD in conjunction with descheduler since the latter is not able to evict pods based on node conditions. This gap can be closed if NPD can expose an option flag (or even better on a per plugin rule config option) to add also a node taint along with node condition when a permanent problem is triggered. A sample config could look like:
{
"plugin": "kmsg",
"logPath": "/dev/kmsg",
"lookback": "5m",
"bufferSize": 10,
"source": "kernel-monitor",
"conditions": [
{
"type": "ReadonlyFilesystem",
"reason": "FilesystemIsNotReadOnly",
"message": "Filesystem is not read-only"
}
],
"rules": [
{
"type": "permanent",
"condition": "ReadonlyFilesystem",
"reason": "FilesystemIsReadOnly",
"pattern": "Remounting filesystem read-only",
"taintNode": true ===> will taint the node with ReadonlyFilesystem=:NoSchedule
}
]
}
The taint should be removed when/if the permanent problem is recovering.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 11
- Comments: 24 (1 by maintainers)
Commits related to this issue
- fixes #457: tainting logic implemented via configuration (#1) — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago
- fixes #457: tainting logic implemented via configuration — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago
- fixes #457: tainting logic implemented via configuration (#2) — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago
Sorry to say but IMHO without this feature to taint a problematic node, where is the value of this project?