prometheus-operator: Updating a persistent volume size (GKE 1.11) does nothing

What did you do?

I had a volume claim set up in my Prometheus resource:

    storage:
      volumeClaimTemplate:
        metadata:
          labels:
            prometheus: k8s
          name: prometheus-storage
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 150Gi
          storageClassName: prometheus-ssd

I tried updating the storage.volumeClaimTemplate.spec.resources.requests.storage from 150Gi to 250Gi. The operator restarted my Prometheus statefulset pods, but the associated persistent volume claim stayed at 150Gi.

What did you expect to see?

I have defined my persistent storage to have allowVolumeExpansion: true. This is a kubernetes 1.11 feature. I expected that the PVC that is managed by prometheus-operator would get updated with the new value of 250Gi, and that the pods would then restart and re-mount with the expanded volumes.

Environment

  • Prometheus Operator version: 0.26.0

  • Kubernetes version information: v1.11.6-gke.3

  • Kubernetes cluster kind: GKE

Manifests Storage Class:

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: prometheus-ssd
parameters:
  type: pd-ssd
provisioner: kubernetes.io/gce-pd
reclaimPolicy: Delete
volumeBindingMode: Immediate

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 32
  • Comments: 43 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Hello! I just want to up this issue, because it is really annoying to edit all PVCs manually and rekick all pods manually when we have the operator.

Hello, Just ran into this exact issue. I had to manually edit each prometheus PVC to adjust size. Resizing was fully transparent. I then updated prometheus-oprator Helm Release. The operator killed all Prometheus at once, instead of the usual rollout.

I also catches this message in operator logs:

level=info ts=2020-03-25T08:39:25.321349082Z caller=operator.go:1180 component=prometheusoperator msg="resolving illegal update of Prometheus StatefulSet" details="&StatusDetails{Name:prometheus-prometheus-operator-prometheus,Group:apps,Kind:StatefulSet,Causes:[]StatusCause{StatusCause{Type:FieldValueForbidden,Message:Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden,Field:spec,},},RetryAfterSeconds:0,UID:,}"

I think this is more of a missing Kubernetes feature than a prometheus-operator related one. StatefulSets don’t yet fully support expanding volumes. Related enhancement proposal: https://github.com/kubernetes/enhancements/pull/660

FYI how to manually resize volumes is now documented at https://prometheus-operator.dev/docs/operator/storage/#resizing-volumes.

We had the same subject, we followed those steps:

  1. Update the Prometheus field spec.storage.volumeClaimTemplate.spec.resources.requests.storage to NEW-SIZE

  2. Patch every PVC with the following command:

kubectl patch pvc/prometheus-pvc-X --patch '{"spec": { "resources": { "requests": { "storage": "NEW-SIZE" } } } }'
  1. Delete the STS to update its definition, the recreation is done right away by the Prometheus operator
kubectl delete sts/prometheus-kube-prometheus-stack-prometheus --cascade=orphan

A fix directly in the prometheus operator would be a great addition 👍

We ended up deleting the PVC and therefore losing the historic data…

actually encounter the same issue now with GKE 1.17 and with quay.io/prometheus-operator/prometheus-operator:v0.43.2 and quay.io/prometheus/prometheus:v2.22.1.

So if I update the capacity inside the prometheus custom resource then the statefulset and pods restarted but no change in PVC size at all.

Only if i kubectl edit pvc PVC1 and extend the capacity, then after less then 1 minute the PVC size change and also the pod is automatically expend the filesystem (with no pod restart).

So is there a way to do it from the customer resource of prometheus? and what is the best practice after changing the PVC size, should I update also the capacity in the customer resource as well (and hit with pod restart?)?

@jalev @aiman-alsari

@jalev your last request seems to be related to https://github.com/prometheus-operator/prometheus-operator/issues/2753 which is also something that would need to be address upstream.

I needed to expand the PVC from 500Gi to 1000Gi and came across a similar issue on AWS, using storageClass: wait-consumer-gp2 with allowVolumeExpansion: true. There are two issues:

  1. The statefulSet resource definition was successfully updated from 500Gi to 1000Gi but the PVC remained at 500Gi. I had to edit the PVC resource manually to kickstart the expansion process.

  2. Lack of proper rollout when modifying the statefulSet. Modifying the statefulSet causes downtime, as all prometheus instances are rotated at the same time.

Regarding (1), should I open a new AWS specific feature request, is anyone working on these? I am not sure if (2) is part of the operator’s implementation or a kubernetes issue.

As a complimentary comment, in order to expand the volume I needed to do it manually modifying the PVC/s directly. The resize happens and Prometheus starts healthy. I’ve modified the VolumeTemplate definition in the Prometheus spec prior to the pvc modification. Don’t know if there will be a problem with this but I doubt is as is exactly the same referenced PVC.