dotnet-operator-sdk: ArgumentNullException in cache timer

Describe the bug I’m running with an operator using a forked repo with the v6.5.3 tag checked out so debugging will be easier.

After letting operator run for several minutes it crashes with an unhandled ArgumentNullException being thrown from a timer doing something with your resource cache. It’s failing here:

// ResourceCache{TEntity}.cs: 104

private bool Exists(TEntity resource) => _cache.ContainsKey(resource.Metadata.Uid);

I hacked around this by adding a string.IsNullOrEmpty() check:

private bool Exists(TEntity resource) => !string.IsNullOrEmpty(resource.Metadata.Uid) && _cache.ContainsKey(resource.Metadata.Uid);

To Reproduce This isn’t super reproducable but this seems to happen after the controller’s StatusModifiedAsync() method is called. I created 20 custom resources with status and then patch the status when each resource is reconciled. Sometimes an ArgumentException is thrown.

Expected behavior Expected no crash.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Here’s the stack trace:

Unhandled exception. System.ArgumentNullException: Value cannot be null. (Parameter 'key')
   at System.ThrowHelper.ThrowArgumentNullException(String name)
   at System.Collections.Concurrent.ConcurrentDictionary`2.ContainsKey(TKey key)
   at KubeOps.Operator.Caching.ResourceCache`1.Exists(TEntity resource) in C:\src\neonKUBE\Lib-forked\KubeOps\src\KubeOps\Operator\Caching\ResourceCache{TEntity}.cs:line 104
   at KubeOps.Operator.Caching.ResourceCache`1.CompareCache(TEntity resource) in C:\src\neonKUBE\Lib-forked\KubeOps\src\KubeOps\Operator\Caching\ResourceCache{TEntity}.cs:line 79
   at KubeOps.Operator.Caching.ResourceCache`1.Upsert(TEntity resource, CacheComparisonResult& result) in C:\src\neonKUBE\Lib-forked\KubeOps\src\KubeOps\Operator\Caching\ResourceCache{TEntity}.cs:line 36
   at KubeOps.Operator.Controller.EventQueue`1.<>c__DisplayClass19_0.<<UpdateResourceData>b__0>d.MoveNext() in C:\src\neonKUBE\Lib-forked\KubeOps\src\KubeOps\Operator\Controller\EventQueue.cs:line 223
--- End of stack trace from previous location ---
   at System.Reactive.PlatformServices.ExceptionServicesImpl.Rethrow(Exception exception) in /_/Rx.NET/Source/src/System.Reactive/Internal/ExceptionServicesImpl.cs:line 19
   at System.Reactive.ExceptionHelpers.Throw(Exception exception) in /_/Rx.NET/Source/src/System.Reactive/Internal/ExceptionServices.cs:line 16
   at System.Reactive.Stubs.<>c.<.cctor>b__2_1(Exception ex) in /_/Rx.NET/Source/src/System.Reactive/Internal/Stubs.cs:line 16
   at System.Reactive.AnonymousObserver`1.OnErrorCore(Exception error) in /_/Rx.NET/Source/src/System.Reactive/AnonymousObserver.cs:line 73
   at System.Reactive.ObserverBase`1.OnError(Exception error) in /_/Rx.NET/Source/src/System.Reactive/ObserverBase.cs:line 59
   at System.Reactive.Linq.ObservableImpl.ConcatMany`1.ConcatManyOuterObserver.Drain() in /_/Rx.NET/Source/src/System.Reactive/Linq/Observable/ConcatMany.cs:line 149
   at System.Reactive.Linq.ObservableImpl.ConcatMany`1.ConcatManyOuterObserver.InnerComplete() in /_/Rx.NET/Source/src/System.Reactive/Linq/Observable/ConcatMany.cs:line 119
   at System.Reactive.Linq.ObservableImpl.ConcatMany`1.ConcatManyOuterObserver.InnerObserver.OnCompleted() in /_/Rx.NET/Source/src/System.Reactive/Linq/Observable/ConcatMany.cs:line 214
   at System.Reactive.Sink`1.ForwardOnCompleted() in /_/Rx.NET/Source/src/System.Reactive/Internal/Sink.cs:line 54
   at System.Reactive.Sink`2.OnCompleted() in /_/Rx.NET/Source/src/System.Reactive/Internal/Sink.cs:line 96
   at System.Reactive.Threading.Tasks.TaskObservableExtensions.EmitTaskResult(Task task, IObserver`1 subject) in /_/Rx.NET/Source/src/System.Reactive/Threading/Tasks/TaskObservableExtensions.cs:line 161
   at System.Reactive.Threading.Tasks.TaskObservableExtensions.SlowTaskObservable.<>c.<Subscribe>b__3_2(ValueTuple`2 tuple2) in /_/Rx.NET/Source/src/System.Reactive/Threading/Tasks/TaskObservableExtensions.cs:line 49
   at System.Reactive.Concurrency.Scheduler.<>c__75`1.<ScheduleAction>b__75_0(IScheduler _, ValueTuple`2 tuple) in /_/Rx.NET/Source/src/System.Reactive/Concurrency/Scheduler.Simple.cs:line 65
   at System.Reactive.Concurrency.UserWorkItem`1.Run() in /_/Rx.NET/Source/src/System.Reactive/Concurrency/UserWorkItem.cs:line 29
   at System.Reactive.Concurrency.ThreadPoolScheduler.<>c__5`1.<Schedule>b__5_0(Object closureWorkItem) in /_/Rx.NET/Source/src/System.Reactive/Concurrency/ThreadPoolScheduler.cs:line 47
   at System.Threading.QueueUserWorkItemCallbackDefaultContext.Execute()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading.PortableThreadPool.WorkerThread.WorkerThreadStart()
   at System.Threading.Thread.StartCallback()

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

WOHOO!!! I finally replicated this problem in a stand-alone test.

The problem happens when an exception is thrown in a controller’s StatusModifiedAsync() method. You can replicate this by:

  1. Configuring the current Kubernetes config for a running cluster
  2. Cloning my repo: https://github.com/nforgeio/support
  3. Opening the solution at: $/KubeOps/22-06-12/TestKubeOps.sln
  4. Setting this command in your launch profile: CreateModifyStatusException
  5. Configure System.ArgumentNullException to break into the debugger
  6. Run the program

The program creates a new resource, modifies its status when reconciled and then throws an exception when StatusModifiedAsync() is called. The program runs for a while and then ArgumentNullException is thrown by the ContainsKey() call as shown in the stack trace above.

NOTE: The test is referencing: KubeOps v7.0.0-preview.2

FYI: I’ve been working on a simple repro without success for far. I’m going to investigate some more.