node-problem-detector: Add option flag to taint node for permanent problems

Currently, when a permanent problem is triggered, node problem detector(NPD) adjusts the corresponding node condition status and triggers an event. This can be seen as a limitation when using NPD in conjunction with descheduler since the latter is not able to evict pods based on node conditions. This gap can be closed if NPD can expose an option flag (or even better on a per plugin rule config option) to add also a node taint along with node condition when a permanent problem is triggered. A sample config could look like:

{
  "plugin": "kmsg",
  "logPath": "/dev/kmsg",
  "lookback": "5m",
  "bufferSize": 10,
  "source": "kernel-monitor",
  "conditions": [
      {
          "type": "ReadonlyFilesystem",
          "reason": "FilesystemIsNotReadOnly",
          "message": "Filesystem is not read-only"
      }
  ],
  "rules": [
      {
          "type": "permanent",
          "condition": "ReadonlyFilesystem",
          "reason": "FilesystemIsReadOnly",
          "pattern": "Remounting filesystem read-only",
          "taintNode": true  ===> will taint the node with ReadonlyFilesystem=:NoSchedule
      }
  ]
}

The taint should be removed when/if the permanent problem is recovering.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 11
  • Comments: 24 (1 by maintainers)

Commits related to this issue

Most upvoted comments

Sorry to say but IMHO without this feature to taint a problematic node, where is the value of this project?