terraform-provider-grafana: TF run produces incorrect diff and fails to apply
Terraform Version
- Terraform: 1.3.3
- Terraform Grafana Provider: 1.30.0
- Grafana: Cloud
Affected Resource(s)
- grafana_rule_group
Steps to Reproduce
For a clean setup, create a new folder (provision or manually).
- Provision a group “default” with a single alert named “Foobar test”. Use a minimal definition that works for your environment. Definition does not matter, it just has to succeed.
- Add another (identical) minimalistic alert named “Foobar”.
- Create a TF run and look at the output.
- Try to apply the run and observe the error.
Expected Behavior
On steps #3
and #4
a new alert rule is added.
Actual Behavior
Apply fails with the error:
│ Error: status: 500, body: {"message":"failed to insert alert rules: a conflicting alert rule is found: rule title under the same organisation and folder should be unique","traceID":"19006e8578a22df0"}
│
│ with grafana_rule_group.default["group"],
│ on resources.rule_group.tf line 1, in resource "grafana_rule_group" "default":
│ 1: resource "grafana_rule_group" "default" {
More information
Apparently the issue comes from wrong matching of existing vs. provisioned alert rules. Instead of just adding a rule “Foobar”, it tries to add “Foobar test” and rename “Foobar test” --> “Foobar”. You can find confirmation of that in run output on step 3:
Terraform will perform the following actions:
# grafana_rule_group.default["group"] will be updated in-place
~ resource "grafana_rule_group" "default" {
id = "bk_R1GDVz;default"
name = "default"
# (3 unchanged attributes hidden)
~ rule {
~ name = "Foobar test" -> "Foobar"
# (7 unchanged attributes hidden)
# (2 unchanged blocks hidden)
}
+ rule {
+ condition = "B"
+ exec_err_state = "Alerting"
+ for = "0s"
+ labels = {
+ "type" = "no_data_test"
}
+ name = "Foobar test"
+ no_data_state = "Alerting"
+ data {
# (hidden as irrelevant)
}
}
}
Plan: 0 to add, 1 to change, 0 to destroy.
Note renaming of the existing rule ^^^^^
Possible solution
The problem is that the Terraform resource is group-centric, while provisioning API is rule-centric. From TF’s perspective, it looks like changing a single grafana_rule_group item, but I can assume that under the hood the above gets converted into one POST /api/v1/provisioning/alert-rules
and one PUT /api/v1/provisioning/alert-rules/{UID}
, with the problem happening on the former.
Thus I see to alternatives:
- Create grafana_alert_rule resource and manage matching of infrastructure vs TF using rule name. Grafana is maintaining unique names within a folder, so there should not be conflicts.
- Fix conversion of TF run diff into API calls. Apparently rule name should be used again to achieve proper matching.
Important Factoids
Are there anything atypical about your accounts that we should know?
No, everything is standard and trivial.
Example definitions
First run & apply:
resource "grafana_rule_group" "default" {
name = "default"
folder_id = "abcdefg"
org_id = 1
interval_seconds = 60
rule {
condition = "B"
for = "0s"
labels = { }
name = "Foobar test"
data { ... }
}
}
Second run:
resource "grafana_rule_group" "default" {
name = "default"
folder_id = "abcdefg"
org_id = 1
interval_seconds = 60
rule {
condition = "B"
for = "0s"
labels = { }
name = "Foobar test"
data { ... }
}
rule {
condition = "B"
for = "0s"
labels = { }
name = "Foobar"
data { ... }
}
}
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 8
- Comments: 17 (4 by maintainers)
The root cause issue mentioned above has been fixed with https://github.com/grafana/grafana/pull/67868. It should be available in Grafana v10.1.X.
Here is the exact root cause of this in grafana/grafana: https://github.com/grafana/grafana/issues/66158
I spent some time trying to create a workaround to this in the provider today. Unfortunately, it only exposed more issues - I found at least 3 different ways this bug can manifest, and several of my workaround approaches had undesirable effects on existing plans. I think closing ^ is probably the fastest path forward to resolving this bug. Thanks for your patience everyone.
Found the root cause - It looks like a re-ordering issue in the Grafana API, in some logic that manages rule groups. What happens is the rules in your state are in a different order from what’s specified in your file - TF tries to fix this ordering on apply (which is the right thing to do), but it hits this re-ordering issue, and the request is rejected.
This is also why I failed to reproduce it earlier - my local state had the rules in a different order. It sent the same request, but did not attempt to re-order the rules. I’ll file an issue in the
grafana/grafana
repo.@hilbert-ralf My workaround is to just destroy and apply again.