kubernetes-client: watch should handle etcd old version exception
I am running spark on kubernetes. This is the full issue description https://issues.apache.org/jira/browse/SPARK-24266
I think the exception too old resource version: 21648111 (21653211)
should be better handled in kubernetes-client instead of simply throw it to the caller because resource version is cached by kubernetes-client, not by the caller. https://github.com/fabric8io/kubernetes-client/blob/5b1a57b64c7dcc7ebeba3a7024e8615c91afaedb/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/dsl/internal/WatchConnectionManager.java#L259-L266
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 31 (11 by maintainers)
Commits related to this issue
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s ### What changes were proposed in this pull request? Restart the watcher when it failed with a HTTP_GONE code from t... — committed to apache/spark by stijndehaes 4 years ago
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s ### What changes were proposed in this pull request? Restart the watcher when it failed with a HTTP_GONE code from t... — committed to jkleckner/spark by stijndehaes 4 years ago
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s Restart the watcher when it failed with a HTTP_GONE code from the kubernetes api. Which means a resource version has ... — committed to jkleckner/spark by stijndehaes 4 years ago
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s Restart the watcher when it failed with a HTTP_GONE code from the kubernetes api. Which means a resource version has ... — committed to jkleckner/spark by stijndehaes 4 years ago
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s Restart the watcher when it failed with a HTTP_GONE code from the kubernetes api. Which means a resource version has ... — committed to jkleckner/spark by stijndehaes 4 years ago
- Update our decommissioning logic to the current upstream. (#673) [SPARK-21040][CORE] Speculate tasks which are running on decommission executors This PR adds functionality to consider the running ... — committed to holdenk/spark by holdenk 4 years ago
- [SPARK-24266][K8S] Restart the watcher when we receive a version changed from k8s Restart the watcher when it failed with a HTTP_GONE code from the kubernetes api. Which means a resource version has ... — committed to jkleckner/spark by stijndehaes 4 years ago
- [SPARK-24266][K8S][3.0] Restart the watcher when we receive a version changed from k8s ### What changes were proposed in this pull request? This is a straight application of #28423 onto branch-3.0 ... — committed to apache/spark by stijndehaes 4 years ago
@manusa one big difference is that with a watcher we can watch one single pod. This is watch spark-submit does when watching the driver, with sharedinformer I am watching all the pods. Unless there is way to watch a single pod? Anyway I guess this will use more resources then needed, unless I am mistaken and this is negligible?
We implemented SharedInformers (#1384) a while back to mimic client-go’s behavior and provide an extra level of abstraction for Watch operations (Kubernetes client-go: watch.Interface vs. cache.NewInformer vs. cache.NewSharedIndexInformer? and Writing Controllers/SharedInformers)
Our implementation of SharedInformers already takes care of HTTP_GONE scenario.
If you are looking for this reconnect behavior, I would encourage using SharedInformers instead of Watch, or else use watch with your own reconnect implementation. I think providing this behavior for watch too would be duplicating a feature that’s already available in Informers.
@rohanKanojia maybe we can use this issue to provide some additional examples and documentation on different use-cases for SharedInformers. I think it’s unclear that they should be the default approach to watch resources.
@stijndehaes I took a look at #1800, is it better to add a bool flag of whether or not do re-watching automatically when receive a version change? So that we won’t break the contract of sending HTTP_GONE if resource version is old and also makes people easier when they don’t care about the problem.
@manusa found it! You can do it like this I think:
@chenchun Is this something we maybe could put into the client? That for some watches you don’t care about version problems.