terraform-provider-grafana: TF run produces incorrect diff and fails to apply

Terraform Version

Terraform: 1.3.3
Terraform Grafana Provider: 1.30.0
Grafana: Cloud

Affected Resource(s)

grafana_rule_group

Steps to Reproduce

For a clean setup, create a new folder (provision or manually).

Provision a group “default” with a single alert named “Foobar test”. Use a minimal definition that works for your environment. Definition does not matter, it just has to succeed.
Add another (identical) minimalistic alert named “Foobar”.
Create a TF run and look at the output.
Try to apply the run and observe the error.

Expected Behavior

On steps #3 and #4 a new alert rule is added.

Actual Behavior

Apply fails with the error:

│ Error: status: 500, body: {"message":"failed to insert alert rules: a conflicting alert rule is found: rule title under the same organisation and folder should be unique","traceID":"19006e8578a22df0"}
│ 
│   with grafana_rule_group.default["group"],
│   on resources.rule_group.tf line 1, in resource "grafana_rule_group" "default":
│    1: resource "grafana_rule_group" "default" {

More information

Apparently the issue comes from wrong matching of existing vs. provisioned alert rules. Instead of just adding a rule “Foobar”, it tries to add “Foobar test” and rename “Foobar test” --> “Foobar”. You can find confirmation of that in run output on step 3:

Terraform will perform the following actions:

  # grafana_rule_group.default["group"] will be updated in-place
  ~ resource "grafana_rule_group" "default" {
        id               = "bk_R1GDVz;default"
        name             = "default"
        # (3 unchanged attributes hidden)

      ~ rule {
          ~ name           = "Foobar test" -> "Foobar"
            # (7 unchanged attributes hidden)

            # (2 unchanged blocks hidden)
        }
      + rule {
          + condition      = "B"
          + exec_err_state = "Alerting"
          + for            = "0s"
          + labels         = {
              + "type" = "no_data_test"
            }
          + name           = "Foobar test"
          + no_data_state  = "Alerting"

          + data {
          # (hidden as irrelevant)
            }
        }
    }

Plan: 0 to add, 1 to change, 0 to destroy.

Note renaming of the existing rule ^^^^^

Possible solution

The problem is that the Terraform resource is group-centric, while provisioning API is rule-centric. From TF’s perspective, it looks like changing a single grafana_rule_group item, but I can assume that under the hood the above gets converted into one POST /api/v1/provisioning/alert-rules and one PUT /api/v1/provisioning/alert-rules/{UID}, with the problem happening on the former.

Thus I see to alternatives:

Create grafana_alert_rule resource and manage matching of infrastructure vs TF using rule name. Grafana is maintaining unique names within a folder, so there should not be conflicts.
Fix conversion of TF run diff into API calls. Apparently rule name should be used again to achieve proper matching.

Important Factoids

Are there anything atypical about your accounts that we should know?

No, everything is standard and trivial.

Example definitions

First run & apply:

resource "grafana_rule_group" "default" {
    name             = "default"
    folder_id        = "abcdefg"
    org_id           = 1
    interval_seconds = 60

    rule {
        condition      = "B"
        for            = "0s"
        labels         = { }
        name           = "Foobar test"

        data { ... }
    }
}

Second run:

resource "grafana_rule_group" "default" {
    name             = "default"
    folder_id        = "abcdefg"
    org_id           = 1
    interval_seconds = 60

    rule {
        condition      = "B"
        for            = "0s"
        labels         = { }
        name           = "Foobar test"

        data { ... }
    }
    rule {
        condition      = "B"
        for            = "0s"
        labels         = { }
        name           = "Foobar"

        data { ... }
    }
}

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 8
Comments: 17 (4 by maintainers)

Most upvoted comments

The root cause issue mentioned above has been fixed with https://github.com/grafana/grafana/pull/67868. It should be available in Grafana v10.1.X.

JacobsonMT on Jun 8, 2023

Here is the exact root cause of this in grafana/grafana: https://github.com/grafana/grafana/issues/66158

I spent some time trying to create a workaround to this in the provider today. Unfortunately, it only exposed more issues - I found at least 3 different ways this bug can manifest, and several of my workaround approaches had undesirable effects on existing plans. I think closing ^ is probably the fastest path forward to resolving this bug. Thanks for your patience everyone.

alexweav on Apr 6, 2023

Found the root cause - It looks like a re-ordering issue in the Grafana API, in some logic that manages rule groups. What happens is the rules in your state are in a different order from what’s specified in your file - TF tries to fix this ordering on apply (which is the right thing to do), but it hits this re-ordering issue, and the request is rejected.

This is also why I failed to reproduce it earlier - my local state had the rules in a different order. It sent the same request, but did not attempt to re-order the rules. I’ll file an issue in the grafana/grafana repo.

alexweav on Jan 25, 2023

+1 as I facing the same issue. Is there any workaround I could apply until this got patched?

@hilbert-ralf My workaround is to just destroy and apply again.

tomersa on Jan 25, 2023