descheduler: Proposal: Long-lived deployment: Triggering Descheduler with Events (SharedIndexInformer rules, endpoints, etc)

Is your feature request related to a problem? Please describe.

When I think about a problem I have that requires to take action when Descheduler does something my first target is one of the events that it triggers as follows: ^1

  • New Node Joined
  • Service Removed
  • Deployment Delete
  • HPA Update
  • Trigger manually by calling and endpoint
  • etc

(Just randomly filled these as an example, not meant to represent the use-cases.)

Describe the solution you’d like

To stay informed about when these events get triggered, we can use a primitive exposed by Kubernetes and the client-go called NewSharedIndexInformer, inside the cache package.

informer.AddEventHandler(cache.ResourceEventHandlerFuncs{
    AddFunc:    func(obj interface{}) { trigger() },
    DeleteFunc: func(interface{}) { trigger() },
})
go informer.Run(...)
// onAdd is the function executed when the kubernetes informer notified the
// presence of a new kubernetes node in the cluster
func onAdd(obj interface{}) {
    node := obj.(*corev1.Node)
    ...
}

Describe alternatives you’ve considered

In current architecture, node.go already handle this concern when we schedule the Descheduler to run as cron. But the motivation here, consider if we set the Descheduler to run every hour or once a week. If we delete the service, added new node, etc. there will be a time-window between event started - last time descheduler ran. Which means we have to wait hours or days to next run.

What version of descheduler are you using?

descheduler version: v0.22.1

Additional context -

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 21 (20 by maintainers)

Most upvoted comments

So sorry for the delay… I had actually started working on a response to this and got completely distracted!

To answer your questions:

You mentioned runMode takes 2 options: Default and Informed. In this requirement, I thought we can pass something like Mixed option: Descheduler should start as Deployment for periodic runs. Additional to this, it should also watch Kubernetes events using SharedIndexInformer as well.

@Dentrax this seems like a fine option for me. I’m not entirely sure why I originally made the run mode design mutually exclusive. I think this may bring up some challenges such as a timed strategy running at the same time as an informed strategy, but this could probably be handled with good multithreading practices.

In my opinion, the strategys can be seprate with two types, one is related to pod, another is related to node. so is it only need to care for the podinformer and the nodeinformer will be better? since the HPA or deployment informer actually we only care for the pod and the node. like below

@JaneLiuL this is the general idea, yeah. And I think that we should definitely initially start with a basic list of informers (like just pod and node, as you mentioned). However, the implementation details are where this gets tricky. Not all strategies will need all the informers, and in the future new strategies might need different informers.

Then we can create one interface for all strategys, and they can have run() method for default mode, and infomered method for informerd mode.

That’s kind of how I was originally trying to address this need, by wrapping the existing default mode functions within a new interface. That interface can be defined with the generic informer functions that will be needed by all strategies, but the actual implementation of the interface can vary for each strategy by only calling upon the relevant informers.

One another alternative way to trigger descheduler would be using custom endpoints as such: [GET] /api/v1/trigger

@Dentrax this is an interesting idea, but not one that I think we should immediately pursue. I think it is safer to keep the descheduler reactive to objects and events that are known to the API server. allowing direct requests like this could bypass the need to know the “state” of the cluster and end up with conflicting, or unpredictable results.

Ultimately I’d like to revisit my implementation, or if there are alternatives/cherry-picks that others would like to work on please feel free. I think we are converging on a similar basic idea for the design that was started there

That’s a good point, I think a configurable batch-delay would help the informed strategies run more efficiently and effectively for use cases regarding high frequency periods of updates.

Regarding each strategy, I think this will still need to consider each strategy on a case-by-case basis. For simplicity and safety, I would prefer to roll this out in just one or two strategies to start (with the intent of adding more/all later) to give the design soak time. If we do that, then we won’t need a policy-level runMode field until a significant portion of the strategies have implemented an informed option (at that time it can be easily added).

This would set us up for a more backwards-friendly development pattern (adding new fields is always backward-compatible, but if at some point we decide to change or remove the policy-level runMode this could break existing configs). wdyt?

Hey @damemi, thanks for clarifying. One point that I might miss is that I want to pass Informed mode to entire DeschedulerPolicy. Additionally, configuring each DeschedulerStrategy to set StrategyRunMode individually would be great. Eventually, here is an example config that covers the idea:

  • runMode: both represents policy will run on a periodic loop and handle informers to react immediately.
  • if we pass runMode: default to any strategy, it overrides the policy’s, and will not get triggered in case of informer events. (would it be overengineering?)
  • if we pass runMode: informed to any strategy, it overrides the policy’s, and will not get triggered on periodic loops.
  • if we didn’t specify a runMode to any strategy, strategy will use the policy’s run mode.
  • Since #489 implements the runMode just for RemovePodsViolatingNodeAffinity and RemovePodsViolatingNodeTaints strategies, I showed only two strategies. Should we want to add this run mode field for each strategy?
  • concern: if we plan new runModes to add in the near future, using both type would be shortsighted?
apiVersion: "descheduler/v1alpha1"
    kind: "DeschedulerPolicy"
	runMode: both
    strategies:
      "RemovePodsViolatingNodeAffinity":
         enabled: true
		 runMode: default
      "RemovePodsViolatingNodeTaints":
         enabled: true
		 runMode: informed

At the end of the day, as @necatican pointed, we expect descheduler will run each strategy in case a new node joined/removed/updated, just like a cronjob.

@Dentrax thanks for bumping this. This is definitely one of the features we had in mind with the framework refactor.

You’re right that this will be an execution-level change rather than a “plugin”. But, it will still need to be extendable/modifiable in similar ways to the plugins (for example, writing a new strategy plugin that needs to be triggered on custom events). In the old codebase there wasn’t a good mechanism for wiring this kind of information through to the strategies that need it.

For now, we should focus on migrating the existing code into the framework model so we don’t overload ourselves. I think the big blocker for this will be migrating the internal code to a plugin-registry (right now the strategies are migrated to “plugins”, but the internals are still basically just wrapping them in the old code). With that, we will be able to start working on an event registry that can be mapped to corresponding plugins.

If you want to put together a preliminary design for how this could work in the new codebase, please do! Having this discussion now, while not a primary focus, will still help inform design decisions for the rest of the framework migration.

Hey @damemi @ingvagabund,

Although we couldn’t do a contribution to the Descheduler Framework migration, we followed closely what was done. Months passed since the initial proposal and internal code base has changed. We’re running Deschedulers on high scale environments and still looking forward to this feature. Also, we want to get into it. In the discussion we had above, I mostly clarified my concerns about what and how we’re going to do. The missing question is the where we are going to put logic in the current codebase.

Plugins directory is currently brand-new. I’m not sure if this is proposal counts as a plugin. I think it is something should be placed in execution level rather than plugin level. I’m a bit confused and lost here.

Can you please enlighten us about where should we write the actual logic in current codebase? Thanks!

cc’ing @eminaktas for awareness

Thank you so much for informing us! I’m definitely interested and want to take a look at that proposal.


On the other hand, we’d like to send a PoC PR for this proposal if you want. It’s obvious that there will be needs we haven’t covered/proposed here for sure. At least, we can continue to work on this idea and get feedback/review after submitting a PR.