FASTER: How to use Upsert in async environment? (It blocks and causes spin, 100% CPU after log memory is full, or after checkpoint)

I’m using FasterKV<SpanByte, SpanByte> (1.7.1) with relatively small log memory (SegmentSizeBits = 1GB, PageSizeBits = 128kB, MemorySizeBits = 4MB, CheckpointType.Snapshot), only for test purposes, otherwise I use larger memory settings.

ReadAsync()+Complete()+WaitForCommitAsync() works fine, but since Upsert() doesn’t have async methods I’m using is synchronously in async function. (I’m aware that there was an UpsertAsync() implementation that has been removed, not sure why.)

I have a timer that calls the checkpoint function periodically and creates full snapshot like

await kv.TakeFullCheckpointAsync(checkpointSettings.CheckPointType);
await kv.CompleteCheckpointAsync();

Checkpointing also works fine, but with the above settings I can insert ~30k entries (~35 byte key and ~1kB value), and then Upsert() starts blocking, and this while loop in AllocatorBase.cs maxes out the load on all of my CPU cores, and even the next full checkpoint will not make it continue Upserting.

The program just enters the Upsert method and never leaves, all my async tasks get blocked (many concurrent tasks are calling the ReadAsync() and Upsert() methods.

I tried everything I could find, tried calling CompletePending() and CompletePendingAsync() before and after in every way I could come up with, I tried calling it while taking snapshot (sometimes I get greater than 0 numbers from session.GetPendingRequests().Count() at taking checkpoint, calling any completependings didn’t help then either).

How do I use Upsert() properly in this environment, or how do I create checkpoint/commit properly, to avoid blocking on Upsert()? I see there is RMWAsync(), is it possible to use it for inserting and updating? (I tried it and it created an entry but the value was missing, I may check it out again.)

Any help will be appreciated!

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 17 (11 by maintainers)

Most upvoted comments

Just a little follow up: Using the 1.7.2 version and creating a strict work dispatcher (having configurable count of dedicated threads, sessions assigned to each thread, and running only inside the loop of the dedicated thread) fixed all issues.

At the end, when I was calling the dispatcher (which actually is blocking and synchronous) from asynchronous tasks, after a certain amount of concurrent calls the program stalled in deadlock on the call of kv.CompleteCheckpointAsync().GetAwaiter().GetResult() method, because even though it is called synchronously, the taskscheduler tried to schedule it, and couldn’t because all other tasks in the stress-test (more than 10000) was blocked by the dispatcher.

The solution was, I moved away from BlockingCollection<T> and ported the dispatcher to BufferBlock<T>, because it can wait for consumables asynchronously, this made me able to create a completly awaitable async API for the dispatcher without any blocking calls, since that I’m not having any deadlocks, and all functions are working without any issue, stress-tested with thousands of concurrent tasks.

Bottom line: A session should be called only from one thread, maintenance methods (recover, checkpointing, compaction) should have an own dedicated thread, and as always, never block in async calls, because checkpointing will result in a deadlock.

Latest release 1.7.2 includes memory sample. See here as well: https://github.com/microsoft/FASTER/blob/master/cs/samples/StoreVarLenTypes/Program.cs

SpanByte will be slightly more efficient overall, though.

Closing this issue as resolved, please reopen if you find any issue with compaction.

So… Right now I created a dedicated maintenance thread that calls checkpointing, compaction, and searches for expired entries to delete (this is not important, it is my implementation), and cleaned up the ReadAsync/Upsert/RMWAsync/Delete methods as You suggested, and I’m kind of having some success, Thank You!

Now I can insert/update not only ~30k entries as before, now I could insert all the 1 million, that I’m using for this test, which is great, now I’m not running into CPU spinning loops, everything works fine, except when around 600k entries compaction begins, then the program hangs, CPU load 100% and doesn’t get over the kv.Log.Compact(kv.Log.SafeReadOnlyAddress, true) call.

Compaction runs in the dedicated thread.

The sessionpool works great, no shared session between tasks/threads. I try dig deeper why compaction causes this hanging.