prometheus-operator: Patch prometheus container args with custom args storage.tsdb.max-block-duration / storage.tsdb.mix-block-duration

What is missing?

I want to change storage.max-block-duration / storage.mix-block-duration for reduce prometheus memory usage.

spec:
  containers:
  - args:
     - --storage.tsdb.min-block-duration=30m
     - --storage.tsdb.max-block-duration=30m
    name: prometheus

Now it replace all args for prometheus container.

Why do we need it?

To use storage.tsdb.min-block-duration / storage.tsdb.max-block-duration in prometheus without rewrite all args. Like this solution: https://github.com/prometheus-operator/prometheus-operator/issues/3848

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 3
  • Comments: 24 (11 by maintainers)

Commits related to this issue

Most upvoted comments

With the latest version of the operator (v0.59.0), it should be possible to pass additional arguments via spec.additionalArgs.

kind: Prometheus
apiVersion: monitoring.coreos.com/v1 
...
spec:
  ...
  additionalArgs:
  - name: storage.tsdb.min-block-duration
    value: 30m
  - name: storage.tsdb.max-block-duration
    value: 24h

Battled this for a long time and this setup helped make progress.

The reason we all want to config these is those that autoscale on a massive level you can easily have pod memory of 30-40gigs of memory.

You have to disable compaction for this to work. If you don’t then it just ignores the arguments from what I can tell and keeps the default 2h in place.

prometheus:
  prometheusSpec:
    containers:
      - name: prometheus
        args:
        - "--web.console.templates=/etc/prometheus/consoles"
        - "--web.console.libraries=/etc/prometheus/console_libraries"
        - "--config.file=/etc/prometheus/config_out/prometheus.env.yaml"
        - "--storage.tsdb.path=/prometheus"
        - "--web.enable-lifecycle"
        - "--storage.tsdb.no-lockfile"
        - "--web.enable-admin-api"
        - "--web.external-url=https://prometheus.fop.bar.com/"
        - "--web.route-prefix=/"
        - "--storage.tsdb.min-block-duration=10m"
        - "--storage.tsdb.max-block-duration=15m"
        - "--storage.tsdb.retention.size=800MB"
        - "--storage.tsdb.retention.time=1h"

    walCompression: false

I’m keeping only 800meg which in my testing is around 20-25min of metrics in wal files before they get checkpointed and deleted. I’m also staying steady around 7gig of memory used. For the first time in a long time though prometheus is reclaiming memory.

IMHO it is a reasonable request for implementations with thanos as centralized point of truth. As the memory needs and the durations parameters could be different for each user of prometheus-operator depending to its scale.

I have the same problem. I use prometheus remote-write to send metrics from different clusters to an observer cluster running thanos-receive. As receive gets problems with big block-durations when a WAL replay is needed eg. after pod restarts, I need to decrease block-durations.

Historically we did not want to expose those flags as their usage is discouraged by upstream (prometheus). Take a look at similar request in https://github.com/prometheus-operator/prometheus-operator/issues/3724

Right now we are closely following upstream prometheus-agent work and its potential integration in prometheus-operator. This should greatly simplify the workflow you described. IMHO it would be much better to focus on that integration than on exposing internal Prometheus switches.