IdGen: Debugging InvalidSystemClockExceptions
We’re running into sporadic InvalidSystemClockExceptions that end up requiring restarts, and we’re running out of ideas trying to find where it goes wrong.
one error example is:
IdGen.InvalidSystemClockException: Clock moved backwards or wrapped around. Refusing to generate id for 4294446821 ticks
from reading the code, this implies the new timestamp is behind the last generated one by 50 days, which seems too big of a difference even if there was a clock sync issue (we’ve checked and our k8s nodes are running chrony and the recorded sync errors are nowhere near that much of a difference).
The issue seems to happen once a month or so, and forces us to manually restart pods as it blocks all writes, and waiting the ticks for it to restart would take days.
Our timesource implementation
public class ClockTimeSource : ITimeSource
{
private long previousTimestamp = 0;
public DateTimeOffset Epoch { get; }
public TimeSpan TickDuration => TimeSpan.FromMilliseconds(1);
public ClockTimeSource() { }
public ClockTimeSource(DateTimeOffset epoch) => Epoch = epoch;
public long GetTicks()
{
var timestamp = (long)(DateTimeOffset.UtcNow - Epoch).TotalMilliseconds;
if (timestamp > previousTimestamp)
{
previousTimestamp = timestamp;
}
return previousTimestamp;
}
}
Would appreciate any pointers on where to start debugging this.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 15 (7 by maintainers)
Yes, but is IdGen the only thing accessing your
ClockTimeSource? Or are you using theClockTimeSourcefor other purposes as well?@hugh-maaskant great spot! 👍
Hi, just chiming in here …
If
GetTicks()is called from multiple threads, I believe there also are two problems in its implementation:timestamp > previousTimestampand the assignment of the new timestamp value need to be atomic. In the really unlikely case that the thread checking is scheduled out just after the check and then it takes a long time to get rescheduled (more than the tick time - so as I said unlikely) than it could reassign a lowertimestampvalue back to thepreviousTimestampthat was meanwhile update twice by other threadsBoth of these are easily (and fairly cheaply) avoided by using a
lock()statement: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/statements/lock.Good luck 😃
If you find out the problem, please let me know? I’m curious to what it was.
Also out of curiosity: your epoch is
2022-10-22 23:04:27 +00:00. This is intentional? (Just checking because it’s a bit of a weird date/time to be an epoch…).