azure-functions-durable-python: Sometimes `context.current_utc_datetime` intermittently evaluates to `None`
To be honest, I’m not too sure how to replicate this reliably, so at this stage I’m more looking more for some help on how to nail this down.
I’m currently facing an issue where deployed durable functions (I don’t really see this locally) would sometimes return None
when trying to evaluate context.current_utc_datetime
. I have many places in the orchestrator where I’m recording timestamps to a database entry, so this is something I use often. This can happen at any one of the many evaluations of context.current_utc_datetime
throughout the run, and I can’t seem to find rhyme or reason as to what causes this.
I’ve noticed that repeatedly evaluating context.current_utc_datetime
in a for loop would eventually return a valid timestamp, so I’ve monkey-patched this hack into the orchestrator:
def get_current_utc_datetime() -> datetime:
curr_time = context.current_utc_datetime
while not curr_time:
sleep(0.05)
curr_time = context.current_utc_datetime
return curr_time
However today, I’m seeing several orchestrator runs where it’s just blocking, seemingly forever, on this hack-y for loop without end, for over 30 minutes, which is much longer than I though an orchestrator is allowed to run. I’ve had to manually stop the entire deployed Functions App in order to try again with another orchestrator instance, but so far it’s all been hitting the same problem.
This is all with beta 11, within the last couple of weeks.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15
@timtylin,
Our whole team is fairly active on our GitHub repos, so your feedback about @davidmrdavid’s work on this issue is noted 🥇.
It sounds like we should close this issue, but I would recommend opening up a separate issue for those weird gaps you are seeing, and we can take a look at those. We should have our internal telemetry all wired up now, so if you give us a timestamp and ideally the orchestration instance id with those weird gaps (and as much information about the orchestration you feel comfortable sharing publicly), that would help us diagnose that issue and see if there are some easy tweaks in the meantime.
Hi @davidmrdavid
I’ve been stress-testing over the weekend and so far I haven’t seen it return None , so I’m happy to say that this no longer happens. I do wonder if this has uncovered some other underlying issue, as I’m still seeing some unexplained long gaps (>100s) between successive timestamps and the only thing in between is an Activity that does a single CosmosDB write. At first I thought it was concurrency issues (I’ve set max Activity concurrency to 1), but this is the only orchestrator running at that time, so I’m still a bit puzzled on how these delays happen.
Thank you very much for resolving the original issue with such a quick turnaround. I just wish there’s some way for me to leave you a great internal review 👏