orleans: InconsistentStateException with ADO.NET SQL Server grain storage

We’ve been getting this exception for a while in various grains. Particularly one which is activated in a StartupTask that should never be deactivated. If I’ve understood correctly this can happen when there are duplicate activations of a grain (this happens perhaps during deployments) - so question is what do you do when that happens? Do you have to catch it and DeactivateOnIdle? Do you call ReadStateAsync and lose that data? We had a grain that during the lifetime of the activation never again could write to state so I’m wondering how you recover from it. I couldn’t find info on this in docs, maybe I missed it

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 37 (30 by maintainers)

Most upvoted comments

I would think stream messages are queued like normal

They are. Timers are the only interleaving bug that has become a feature that some people now depend on.

Thanks for your suggestions, I’ll keep monitoring the Silo closely so far the trick was done using my own store to keep state and moving all grains to stateless, I’ll keep it that way in the meantime.

Thanks again for you time and help. Have an excellent day.

Sent from my Email account 😉.

On Feb 9, 2021, at 5:56 PM, Veikko Eeva notifications@github.com wrote:

@eramosr16 No problem, mine either. It’s that I try to see if there’s something we could improve.

Maybe one strategy you could try is combining timers and reminders to decrease load on the database, like at https://dotnet.github.io/orleans/docs/grains/timers_and_reminders.html#combining-timers-and-reminders. Further, indexes to the values that are being read, e.g. https://github.com/dotnet/orleans/blob/master/src/AdoNet/Orleans.Reminders.AdoNet/SQLServer-Reminders.sql#L103. These actions should reduce load on the database and make queries faster. Avoid operations that introduce etag mismatches (e.g. don’t use persistence with stateless grains).

For the record, I think the DB should be indexed by default. I should make investigate this finally if no one else will. At least as a temporary measure I think any index will help, one can think later then refactoring those tables and introducing purpose built indexes.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

Interesting, that definitely solves one of my issues, thanks. I would think stream messages are queued like normal, if not I have som refactoring to do

Timers execute interleaved. Not sure about streams.

I also wanna add that this issue has been discussed before. #2565 A possible workaround via self-invocation for timers was posted here: #2574

Recently I had to deal with this issue as well. For me the trouble came from interleaving code in stateful grains. In particular orleans timers can be a source of this problem.

As soon as 2 calls end up on WriteStateAsync() at the same time, you have a good chance for this error to happen.

In my opinion ETags should prevent different activations from writing to the same state but the current design doesn’t even allow the same grain to write multiple times.