prometheus: 2.6.0: opening storage failed: mkdir data/: read-only file system

Bug Report

What did you do?

Upgrading from Docker 2.5.0 to 2.6.0 introduces a fatal error, “opening storage failed: mkdir data/: read-only file system”. Deleting everything and reverting to 2.5.0 with the exact same configuration does not have this problem.

What did you expect to see?

I don’t see that any changes are required for 2.6.0.

Environment This is running in a vanilla minikube running k8s 1.11.6.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  namespace: monitor
  name: prometheus
spec:
  replicas: 1
  revisionHistoryLimit: 2
  strategy:
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 0
  template:
    metadata:
      name: prometheus
      labels:
        service: prometheus
        apiVersion: v2
        app: prometheus
    spec:
      containers:
      - name: prometheus
        image: quay.io/prometheus/prometheus:v2.5.0
        imagePullPolicy: Always
        ports:
        - name: web
          containerPort: 9090
        livenessProbe:
          tcpSocket:
            port: 9090
          initialDelaySeconds: 30
          timeoutSeconds: 5
        readinessProbe:
          tcpSocket:
            port: 9090
        resources:
          requests:
            cpu: 10m
            memory: 32Mi
          limits:
            memory: 64Mi
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus
        - name: data
          mountPath: /prometheus
      volumes:
      - name: config
        configMap:
          name: prometheus
      - name: data
        persistentVolumeClaim:
          claimName: prometheus
  • System information:

    quay.io/prometheus/prometheus:v2.6.0

  • Prometheus version:

    v2.6.0

  • Logs:

level=info ts=2018-12-26T14:41:12.22879461Z caller=main.go:243 msg="Starting Prometheus" version="(version=2.6.0, branch=HEAD, revision=dbd1d58c894775c0788470944b818cc724f550fb)"
level=info ts=2018-12-26T14:41:12.228845931Z caller=main.go:244 build_context="(go=go1.11.3, user=root@bf5760470f13, date=20181217-15:14:46)"
level=info ts=2018-12-26T14:41:12.228864658Z caller=main.go:245 host_details="(Linux 4.15.0 #1 SMP Fri Dec 21 23:51:58 UTC 2018 x86_64 prometheus-85c56c84d8-chf77 (none))"
level=info ts=2018-12-26T14:41:12.228880573Z caller=main.go:246 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-12-26T14:41:12.228894018Z caller=main.go:247 vm_limits="(soft=unlimited, hard=unlimited)"
level=info ts=2018-12-26T14:41:12.229899898Z caller=main.go:561 msg="Starting TSDB ..."
level=info ts=2018-12-26T14:41:12.229984509Z caller=web.go:429 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-12-26T14:41:12.229996811Z caller=main.go:430 msg="Stopping scrape discovery manager..."
level=info ts=2018-12-26T14:41:12.230007845Z caller=main.go:444 msg="Stopping notify discovery manager..."
level=info ts=2018-12-26T14:41:12.23001292Z caller=main.go:466 msg="Stopping scrape manager..."
level=info ts=2018-12-26T14:41:12.230018683Z caller=main.go:440 msg="Notify discovery manager stopped"
level=info ts=2018-12-26T14:41:12.23008139Z caller=main.go:426 msg="Scrape discovery manager stopped"
level=info ts=2018-12-26T14:41:12.23009587Z caller=main.go:460 msg="Scrape manager stopped"
level=info ts=2018-12-26T14:41:12.230107221Z caller=manager.go:664 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-12-26T14:41:12.230125069Z caller=manager.go:670 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-12-26T14:41:12.230134183Z caller=notifier.go:521 component=notifier msg="Stopping notification manager..."
level=info ts=2018-12-26T14:41:12.230144125Z caller=main.go:615 msg="Notifier manager stopped"
level=error ts=2018-12-26T14:41:12.230506795Z caller=main.go:624 err="opening storage failed: mkdir data/: read-only file system"

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 24 (13 by maintainers)

Commits related to this issue

Most upvoted comments

@Dravere The Dockerfile wasn’t explicitly mentioned in the stability guarantees in https://prometheus.io/blog/2016/07/18/prometheus-1-0-released/#what-does-1-0-mean-for-you, so we don’t have an official rule for that. Most likely we would not have made a breaking change intentionally though.

While rolling back would be another breaking change, you could also see a rollback as a bugfix of a previous unintentional breakage that should never have happened. Especially if most users still have to hit it.

Please take this as constructive criticism and not some rant on the team owing me or anyone else anything! You’re doing great work!

I’m at a loss to see how the symlink introduced in #4976 improves anything. I was trying to setup and test prometheus and spent the better part of two days trying to figure out why i was getting this perm issue as Google does not find this issue. And even once I did find it, without looking at the changes in the commit its still nearly impossible to understand whats going on and why anything mentioned in this thread fixes things. “opening storage failed: mkdir data/: permission denied” is a relative path error and distinctly unhelpful in troubleshooting the issue.

Symlinking your data directory into /etc makes no sense to me and is against several decades of Unix file system convention. Configs go in /etc, data should be in /opt, /var or in docker’s case /prometheus, /data or /prometheus/data would be acceptable.

On top of all this, you’ve outdated and broken every promethues docker tutorial out on the web with this change. I would urge you to reconsider and possibly revert the symlink or at least make a section in the main README that addresses this change, the permission denied error it causes, the changes needed to fix it, and a well thought out explanation of why you needed to do this. I can only imagine that its going to affect your adoption/use rates for the next year, at least, as folk get frustrated following an online docker tutorials that all no longer work.

In closing please understand that I’m not mad and understand changes happen and are often needed. I don’t have the time to dig into this change and understand why the symlink was needed, so I may in fact be the one that’s in the wrong!

@SuperQ that makes it a bit more clear, but still not sure you’re going about it in the right way. I would humbly recommend you consider rolling back the change, and spin up a 2.0 branch that fixes the WORKDIR. Which seems to be the real problem, instead of trying to mess with the existing versions image file layout. The way it is now you are breaking existing deployments when they upgrade and there current volume setups cause these issus. And you are invalidating any tutorials that were written by the community to this point.

I understand that the rollback has its own risks to folks who have already dealt with this. But that’s what README and CHANGELOG files are for. Make it clear what you are doing, why you are doing it and continue to update the docs/wiki with info on how to deal with the common issues that may come up from the revert.

https://semver.org/

Given a version number MAJOR.MINOR.PATCH, increment the:

MAJOR version when you make incompatible API changes, MINOR version when you add functionality in a backwards-compatible manner, and PATCH version when you make backwards-compatible bug fixes.

We also had this issue. It works when changing the data path:

        - name: data
          mountPath: /prometheus-data
         args:
           - '--config.file=/etc/prometheus/prometheus.yaml'
+          - '--storage.tsdb.path=/prometheus-data'