kubernetes-client: watch should handle etcd old version exception

I am running spark on kubernetes. This is the full issue description https://issues.apache.org/jira/browse/SPARK-24266

I think the exception too old resource version: 21648111 (21653211) should be better handled in kubernetes-client instead of simply throw it to the caller because resource version is cached by kubernetes-client, not by the caller. https://github.com/fabric8io/kubernetes-client/blob/5b1a57b64c7dcc7ebeba3a7024e8615c91afaedb/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L259-L266

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 31 (11 by maintainers)

Commits related to this issue

Most upvoted comments

@manusa one big difference is that with a watcher we can watch one single pod. This is watch spark-submit does when watching the driver, with sharedinformer I am watching all the pods. Unless there is way to watch a single pod? Anyway I guess this will use more resources then needed, unless I am mistaken and this is negligible?

We implemented SharedInformers (#1384) a while back to mimic client-go’s behavior and provide an extra level of abstraction for Watch operations (Kubernetes client-go: watch.Interface vs. cache.NewInformer vs. cache.NewSharedIndexInformer? and Writing Controllers/SharedInformers)

Our implementation of SharedInformers already takes care of HTTP_GONE scenario.

If you are looking for this reconnect behavior, I would encourage using SharedInformers instead of Watch, or else use watch with your own reconnect implementation. I think providing this behavior for watch too would be duplicating a feature that’s already available in Informers.

@rohanKanojia maybe we can use this issue to provide some additional examples and documentation on different use-cases for SharedInformers. I think it’s unclear that they should be the default approach to watch resources.

@stijndehaes I took a look at #1800, is it better to add a bool flag of whether or not do re-watching automatically when receive a version change? So that we won’t break the contract of sending HTTP_GONE if resource version is old and also makes people easier when they don’t care about the problem.

@manusa found it! You can do it like this I think:

val podInformer = informers.sharedIndexInformerFor(
      classOf[Pod],
      classOf[PodList],
      new OperationContext().withNamespace(NAMESPACE).withName(PODNAME),
      60000)

@yujiantao For a simple fix, you can try comment out these lines https://github.com/fabric8io/kubernetes-client/blob/v4.0.5/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L141-L143 We’ve using it for a long time, everything is fine.

@chenchun Is this something we maybe could put into the client? That for some watches you don’t care about version problems.