node-problem-detector: Add option flag to taint node for permanent problems

Currently, when a permanent problem is triggered, node problem detector(NPD) adjusts the corresponding node condition status and triggers an event. This can be seen as a limitation when using NPD in conjunction with descheduler since the latter is not able to evict pods based on node conditions. This gap can be closed if NPD can expose an option flag (or even better on a per plugin rule config option) to add also a node taint along with node condition when a permanent problem is triggered. A sample config could look like:

{
  "plugin": "kmsg",
  "logPath": "/dev/kmsg",
  "lookback": "5m",
  "bufferSize": 10,
  "source": "kernel-monitor",
  "conditions": [
      {
          "type": "ReadonlyFilesystem",
          "reason": "FilesystemIsNotReadOnly",
          "message": "Filesystem is not read-only"
      }
  ],
  "rules": [
      {
          "type": "permanent",
          "condition": "ReadonlyFilesystem",
          "reason": "FilesystemIsReadOnly",
          "pattern": "Remounting filesystem read-only",
          "taintNode": true  ===> will taint the node with ReadonlyFilesystem=:NoSchedule
      }
  ]
}

The taint should be removed when/if the permanent problem is recovering.

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 11
Comments: 24 (1 by maintainers)

Commits related to this issue

fixes #457: tainting logic implemented via configuration (#1) — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago
fixes #457: tainting logic implemented via configuration — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago
fixes #457: tainting logic implemented via configuration (#2) — committed to bilalcaliskan/node-problem-detector by bilalcaliskan 3 years ago

Most upvoted comments

Sorry to say but IMHO without this feature to taint a problematic node, where is the value of this project?

universam1 on Jun 8, 2022