telegraf: SQLServer Plugin: Throwing Errors when HADR DB Not Readable
Relevant telegraf.conf:
[[inputs.sqlserver]]
servers = ["Server=xxxxxxxxxxx;Port=1433",]
exclude_query = []
database_type = "SQLServer"
System info:
All Windows versions Telegraf version 1.18.0
Steps to reproduce:
Setup the sqlserver input plugin to collect data against a SQL Server instance that is hosting secondary databases in an AlwaysOn Availability group. These databases aren’t configured for read-only access.
No queries are excluded from the sqlserver input plugin.
Expected behavior:
No errors to be thrown
Actual behavior:
The below errors are thrown as the database can’t be queried:
[inputs.sqlserver] Error in plugin: Script SQLServerRequests failed: mssql: The target database, ‘xxxxxxxxxx’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.
[inputs.sqlserver] Error in plugin: mssql: The target database, ‘xxxxxxxxxx’, is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group. For more information, see the ALTER AVAILABILITY GROUP statement in SQL Server Books Online.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (12 by maintainers)
Today I did some testing and it is a bug inside SQL Server itself.
Repro is just as easy as running SELECT * FROM sys.dm_exec_requests via stress tool against secondary replica (I used SQLQueryStress in this case):
Full error msg:
The target database, '', is participating in an availability group and is currently not accessible for queries. Either data movement is suspended or the availability replica is not enabled for read access. To allow read-only access to this and other databases in the availability group, enable read access to one or more secondary availability replicas in the group.
Changing application intent to read-only is not helping either.
As it is not a bug inside telegraf sqlserver plugin I suggest implementing a try catch for this specific error (with or without retry logic). This is a sample code:
@Trovalo / @powersj - what do you think? If you agree I can implement it and create a PR.
I should probably explain it better. The query which is failing is a select to a DMV sys.dm_exec_requests. It normally returns a set of running queries. This DMV should never return this error as it is never part of AOAG (see point 1). It is true even for contained availability groups, which are new in SQL Server 2022.
Another explanation could be. When the error is:
The target database, '', is participating in an availability group and is currently not accessible for queries
If there is a database which is not accessible for queries - no queries from that database should be shown (= we should not get the error) If there is a database which is accessible for queries - a query should be shown (and we should not get this error as the database must be queryable).
It is just a guess, but what is most likely happening in the background of this DMV, is that there are some internal checks for additional details about some internal process (I am getting this error on standard edition too - despite the fact, that it does not support readable replicas at all) and this check is not handled properly for AOAG nonreadable databases.
The solution which @srebhan suggests is ok for me, but I don’t think that count of failures for individual queries is persisted at this moment. It needs code changes inside “plugin” logic. Try catch is handled inside SQL Server query only and has no impact on other queries.