java-operator-sdk: InformerEventSource throwing NPE if the watched resource is deleted
Bug Report
What did you do?
- Configured the event source:
@Override
public List<EventSource> prepareEventSources(EventSourceContext<Keycloak> context) {
SharedIndexInformer<Deployment> deploymentInformer =
client.apps().deployments().inAnyNamespace()
.withLabels(Constants.DEFAULT_LABELS)
.runnableInformer(0);
return List.of(new InformerEventSource<>(
deploymentInformer, d -> {
var ownerReferences = d.getMetadata().getOwnerReferences();
if (!ownerReferences.isEmpty()) {
return Set.of(new ResourceID(ownerReferences.get(0).getName(),
d.getMetadata().getNamespace()));
} else {
return Collections.emptySet();
}
}));
}
- Created the secondary resource (Deployment) in a reconciliation loop.
- Deleted the Deployment manually to simulate user unintentionally deleting it.
What did you expect to see?
No NPE. 😉
What did you see instead? Under which circumstances?
2022-01-13 18:16:00,804 ERROR [io.fab.kub.cli.inf.cac.SharedProcessor] (OkHttp https://127.0.0.1:52740/...) Failed invoking io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource$1@18998b60 event handler: null: java.lang.NullPointerException
at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource$1.onUpdate(InformerEventSource.java:88)
at io.javaoperatorsdk.operator.processing.event.source.informer.InformerEventSource$1.onUpdate(InformerEventSource.java:78)
at io.fabric8.kubernetes.client.informers.cache.ProcessorListener$UpdateNotification.handle(ProcessorListener.java:85)
at io.fabric8.kubernetes.client.informers.cache.ProcessorListener.add(ProcessorListener.java:47)
at io.fabric8.kubernetes.client.informers.cache.SharedProcessor.lambda$distribute$0(SharedProcessor.java:79)
at io.fabric8.kubernetes.client.informers.cache.SharedProcessor.lambda$distribute$1(SharedProcessor.java:101)
at io.fabric8.kubernetes.client.utils.SerialExecutor.lambda$execute$0(SerialExecutor.java:40)
at io.fabric8.kubernetes.client.utils.SerialExecutor.scheduleNext(SerialExecutor.java:52)
at io.fabric8.kubernetes.client.utils.SerialExecutor.execute(SerialExecutor.java:46)
at io.fabric8.kubernetes.client.informers.cache.SharedProcessor.distribute(SharedProcessor.java:98)
at io.fabric8.kubernetes.client.informers.cache.SharedProcessor.distribute(SharedProcessor.java:79)
at io.fabric8.kubernetes.client.informers.cache.ProcessorStore.update(ProcessorStore.java:48)
at io.fabric8.kubernetes.client.informers.cache.ProcessorStore.update(ProcessorStore.java:29)
at io.fabric8.kubernetes.client.informers.cache.Reflector$ReflectorWatcher.eventReceived(Reflector.java:134)
at io.fabric8.kubernetes.client.informers.cache.Reflector$ReflectorWatcher.eventReceived(Reflector.java:114)
at io.fabric8.kubernetes.client.utils.WatcherToggle.eventReceived(WatcherToggle.java:49)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.eventReceived(AbstractWatchManager.java:203)
at io.fabric8.kubernetes.client.dsl.internal.AbstractWatchManager.onMessage(AbstractWatchManager.java:269)
at io.fabric8.kubernetes.client.dsl.internal.WatcherWebSocketListener.onMessage(WatcherWebSocketListener.java:68)
at io.fabric8.kubernetes.client.okhttp.OkHttpWebSocketImpl$BuilderImpl$1.onMessage(OkHttpWebSocketImpl.java:92)
at okhttp3.internal.ws.RealWebSocket.onReadMessage(RealWebSocket.java:322)
at okhttp3.internal.ws.WebSocketReader.readMessageFrame(WebSocketReader.java:219)
at okhttp3.internal.ws.WebSocketReader.processNextFrame(WebSocketReader.java:105)
at okhttp3.internal.ws.RealWebSocket.loopReader(RealWebSocket.java:273)
at okhttp3.internal.ws.RealWebSocket$1.onResponse(RealWebSocket.java:209)
at okhttp3.RealCall$AsyncCall.execute(RealCall.java:174)
at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Environment
Kubernetes cluster type:
vanilla
$ Mention java-operator-sdk version from pom.xml file
2.0.0-RC1 (commit: fd6e493)
$ java -version
openjdk version “11.0.12” 2021-07-20 OpenJDK Runtime Environment Homebrew (build 11.0.12+0) OpenJDK 64-Bit Server VM Homebrew (build 11.0.12+0, mixed mode)
$ kubectl version
Client Version: version.Info{Major:“1”, Minor:“23”, GitVersion:“v1.23.1”, GitCommit:“86ec240af8cbd1b60bcc4c03c20da9b98005b92e”, GitTreeState:“clean”, BuildDate:“2021-12-16T11:33:37Z”, GoVersion:“go1.17.5”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“22”, GitVersion:“v1.22.3”, GitCommit:“c92036820499fedefec0f847e2054d824aea6cd1”, GitTreeState:“clean”, BuildDate:“2021-10-27T18:35:25Z”, GoVersion:“go1.16.9”, Compiler:“gc”, Platform:“linux/amd64”}
Possible Solution
Additional context
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (3 by maintainers)
So this is happening due to modifications to the cached resource.
It might not even be the operator logic making the modification. This was touched on in #3078 but not really addressed. Essentially the logic in createOrReplace, replace, or patch is allowed to modify the passed in object - in particular the resourceVersion. For example using an item directly from the cache in createOrReplace will set the resourceVersion to null - https://github.com/fabric8io/kubernetes-client/blob/32c6a88f029ba64b1f4225c5f122f2247b1b74d7/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/utils/CreateOrReplaceHelper.java#L46
I’m thinking we need fabric8 to not do this. To be safe we’ll need to clone those objects before modifying: https://github.com/fabric8io/kubernetes-client/issues/3756
It has also been discussed whether the objects obtained from the cache should already be cloned or write protected to prevent any future modifications - only cloning would be easy to implement. The only downside would be the general performance overhead - it’s also possible to consider adding methods that would differentiate get vs getDirect.
It’s not generally possible for the newObject to be null. The incoming event is processed by the reflector. There is an explicit null check there: https://github.com/fabric8io/kubernetes-client/blob/9e8ae39711143648158f555757296c613175c819/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/informers/cache/Reflector.java#L122
That same object reference will be passed all the way to your event handler with the stacktrace you are showing.
The resourceVersion is read by the Watcher as well, so we know that the same call was not an NPE at this point: https://github.com/fabric8io/kubernetes-client/blob/9e8ae39711143648158f555757296c613175c819/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/informers/cache/Reflector.java#L140
Since the resource is added to the store prior to the handler being called it is possible that other logic is modifying the object first - is it possible that the operator framework is setting the metadata to null? Or are you sure that the new object is null - that would be very surprising.
From the watcher perspective yes. But I’m not sure if the Informer might be merging events. The only explanation that occurs to me (without checking the informer code), is that the resource is deleted from the cache before the update event is emitted. i.e. modified + delete event are received by the informer watcher, when the modified event is composed+emitted, the resource was already deleted because of the watcher delete event.
This seems to be an issues with informer in fabric8 client, will implement a quick fix on our side, and address it there.