cortex: Cortex can read rules but doesn't activate them

Description

I’m running 1.4.0 using the binary from GitHub and I have ruler configured to send alerts to my own cluster of Alertmanager.

For a moment I saw the alerts in my Alertmanager Web UI, but shortly after they disappeared.

Config

My ruler section of the config looks like this:

ruler:
  external_url: 'https://alerts.example.org/'
  alertmanager_url: 'http://localhost:9093/'
  enable_alertmanager_v2: true
  rule_path: '/var/tmp/cortex/rules'
  enable_api: true
  storage:
    type: local
    local:
      directory: '/etc/cortex/rules'

My rules are located in /etc/cortex/rules/fale since I use auth_enabled: false.

Debugging

I can see the rules are located in the right place because I can look them up using the /api/v1/rules call:

 > curl -s 'http://localhost:9092/api/v1/rules' | head
instance.yml:
    - name: instance
      rules:
        - alert: InstanceDown
          expr: up == 0
          for: 5m
          annotations:
            current_value: '{{ $value }}'
            description: '{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes.'
            summary: Instance {{ $labels.instance }} down

But, when I try to use the /prometheus/api/v1/rules path I get nothing:

 > curl -s 'http://localhost:9092/prometheus/api/v1/rules' -H 'X-Scope-OrgID: fake' | jq .
{
  "status": "success",
  "data": {
    "groups": []
  },
  "errorType": "",
  "error": ""
}

Even though just minutes ago I saw the rules displayed here. As well as the alerts generated by the rules. But now there’s nothing there:

 > curl -s 'http://localhost:9092/prometheus/api/v1/alerts' -H 'X-Scope-OrgID: fake' | jq .
{
  "status": "success",
  "data": {
    "alerts": []
  },
  "errorType": "",
  "error": ""
}

I’m confused as to what caused them to disappear. Restarting Cortex nodes doesn’t fix the issue.

Questions

  • My understanding is that ruler.rule_path is the place where Cortex checks for rule files. Correct?
  • My understanding is that ruler.storage.local.directory configures a temporary location for rule files. Correct?
  • Why can the rules be loaded from ruler.rule_path but are not available via /prometheus/api/v1/rules?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (16 by maintainers)

Most upvoted comments

For others who are also stumbling on this problem. I gave up with the Cortex alertmanager and reverted back to the prometheus operator alertmanager and that worked like I was thinking it would. To me the Cortex alertmanager does not compute. Which is a shame because I would have preferred to use the Cortex one as it logically makes more sense to do alerting on the cortex level.

@pracucci I think we categorise this issue as a doc improvement as everything I’ve said before can be summarised into a “getting started with rule evaluation in Cortex guide”

Just to add to this, I’ve been spending the last two days trying to wrap my head around how to actually use the Ruler, Alertmanager and configsdb and how they are interacting with each other. My prometheus is firing alerts but I don’t see any way in Cortex itself to pick these up. I also don’t understand how you are supposed to use the Ruler. The API is experimental, so does that mean you’re ending up with writing YAML files and putting them in a configmap / S3 bucket so the Ruler can pick them up?

If the latter is true then it would be nice if it could work with the prometheus kubernetes operator, so that you can use your CRDs to define a PrometheusRule and then the Ruler can pick them up. I guess this is currently not supported right now?

Lots of questions and more of a rambling than anything else, but as mentioned I’m trying to wrap my head around how it actually works. The documentation on this topic does need improvement 😄