azure-functions-host: Chance of getting 409 while renewing the primary host lock
While investigating another issue I stumbled upon a lot of these 409s coming from renewing the primary host lock. I did some investigation and I believe that, because we do a ‘fire and forget’ with the host disposal, it’s possible that a new host comes up and acquires the lock just before the old host releases it. That means that the next attempt to renew fails, as the lock has been released.
The error looks like: Singleton lock renewal failed for blob 'locks/AppName/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2017-09-02T18:38:31.526Z (111006 milliseconds ago) with a duration of 57508 milliseconds. The lease period was 15000 milliseconds.
I think it’s happening with something like below.
Note 1 – this is all on a single instance and references to hosts are about the hosts being cycled via the ScriptHostManager
.
Note 2 – All of the host locks use the current machine id as the proposed lease, which allows multiple hosts on the same machine modify the lease.
Host1 -> acquire lock Host1 -> renew lock something triggers host recycle and creates a new host while disposing of the old host Host2 -> acquire lock Host1 -> release lock Host2 -> renew lock -> throw
It’s actually possible for several other combinations to happen (I’ve seen a race where Host1 renews right as it releases, also causing a throw).
Ultimately, these errors seem benign as they’ll eventually recover and all will be well, but it causes a lot of errors in the logs (I see 100k+ 409 logs that involve the host lock over the last week).
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 11
- Comments: 51 (12 by maintainers)
Commits related to this issue
- Chance of getting 409 while renewing the primary host lock. Fixes #1864 — committed to alrod/azure-webjobs-sdk-script by alrod 7 years ago
- Chance of getting 409 while renewing the primary host lock. Fixes #1864 — committed to alrod/azure-webjobs-sdk-script by alrod 6 years ago
- Chance of getting 409 while renewing the primary host lock. Fixes #1864 — committed to Azure/azure-functions-host by alrod 6 years ago
- Revert "Chance of getting 409 while renewing the primary host lock. Fixes #1864" This reverts commit f6f5fb36915619140a2e32c18aee358549be0b23. — committed to Azure/azure-functions-host by mathewc 6 years ago
I am also getting this error in a v2 C# function app.
Singleton lock renewal failed for blob 'xxx/host' with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 2019-09-14T04:18:13.801Z (40380 milliseconds ago) with a duration of 7 milliseconds. The lease period was 15000 milliseconds.
Is there something we should be doing to prevent this? Or should we ignore this error?
I’m also seeing this error. This is the most recent instance:
I’m running on Functions v2 and all of the functions inside this app are HTTP triggered and short living.
I agree with the other people in this thread. If this isn’t an issue, it shouldn’t be logged as an error. Errors are something that we need to look at and not simply ignore (IMO).
I’ll be following this, since we also see this problem with a consumption plan Azure Function App
I am also seeing this error, although I am using a single Storage Account for a Function App. The issue appears when scaling out to multiple instances. I have just one BlobTrigger function and two TimerTrigger functions. Is there a way to avoid this error or at least make it not log as an exception in App Insights?
I can confirm that this happens in environments with individual storage accounts for each function app as well. I always create a new storage account for every new function app I create. Sharing storage accounts introduce some other problems why I wouldn’t recommend sharing storage accounts between function apps ever 👍
We’ve been seeing this error logged in our function apps (C# v2) fairly regularly. It doesn’t appear to cause any issues, but assuming there is nothing to worry about it’s just unhelpful noise. Are there any plans to ‘fix’ this or change the logging level so that it doesn’t appear as an error?
This was affecting my Azure Functions V4 apps with .NET 6 with deployment slots.
What I ended up discovering was that one of my swap slot was using the same Storage Account connection string as my production Slot. Once I changed this the lease problem stopped appearing in Application Insights.
I started seeing this after updating (function) apps (version ~4 / .net 6) and updating all of dependencies, there are only time triggers (multiple) and service bus triggers (multiple) there is 1 storage account. on .net3.1 and some older library versions I havent seen those logging, perhaps it’s just some noise, I havent spotted yet if that affects function execution in any way.
Request [xxxxxxxxxxx] PUT https://name.blob.core.windows.net/azure-webjobs-hosts/locks/fname/host?comp=lease x-ms-lease-action:acquire x-ms-lease-duration:15 x-ms-proposed-lease-id:xxxxxxxxxxxx x-ms-version:2020-08-04 Accept:application/xml
Error response [xxxxxxxxxxxx] 409 There is already a lease present. (00.0s) Server:Windows-Azure-Blob/1.0,Microsoft-HTTPAPI/2.0 x-ms-version:2020-08-04 x-ms-error-code:LeaseAlreadyPresent
Those are logs of severity “Warning” in insights
Adding on to the issues people are seeing but I am still on v1 and just had a Function crash because of it. The issue is not benign on v1.
Any lease renewals that end with /host are likely nothing to worry about. Instances can swap between “primary” hosts without affecting the running functions. Let me know if you see anything that may be leading to failures and I can investigate.
Also seeing this error (409 with ‘LeaseIdMismatchWithLeaseOperation’) on one of our function apps (Runtime version ~4). It doesn’t happen every run though, just once in a while. Still annoying to have this pop up in our exceptions dashboard.
Are there no easy way to silence these errors if they don’t really matter?
I also started seeing this after updating (function) apps (version ~4 / .net 6) and updating all of dependencies.
A single Service Bus Trigger. 1 Function, 1 Storage account.
Didnt see this before update to Function v4 and net 6
@brettsam We are seeing the same error when using ServiceBus triggered AZ functions (v2) in the consumption tier (EAST US).
Not sure if this is of significance, but we see some signs that it occurs after the functions are done processing the messages on the queue.
I’ve pinged you on the other thread since your issue is not with the primary host lock. Let’s continue there so we don’t confuse this one.
Same issue in v2 java function app.
Singleton lock renewal failed for blob ‘xxx/host’ with error code 409: LeaseIdMismatchWithLeaseOperation. The last successful renewal completed at 0001-01-01T00:00:00Z (-2147483648 milliseconds ago) with a duration of 0 milliseconds. The lease period was 15000 milliseconds.
Runtime version: 2.0.12507.0 (beta)