rancher: Glusterfs catalog version wont start anymore

Version: rancher v0.59.0 cattle v0.148.0 user interface v0.90.0 rancher compose v0.7.2

Steps:

  1. Create glusterFS stack
  2. Create convoy gluster stack
  3. Create some volumes
  4. Remove the volumes
  5. Remove gluster and convoy stack
  6. Create Glusterfs

Results: glusterfs stays in a booting loop rancher rancher

Expected: A running glusterfs stack

EDIT: step 7. Create convoy gluster

results in the following error:


2/22/2016 2:26:48 PMWaiting for metadata.
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/2522/ns/mnt -F -- /var/lib/docker/aufs/mnt/a6ba41d189c1d3adbf3c8cbdb347ada8f0786910277c3184785a21ccc937441e/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30 -- /launch volume-agent-glusterfs-internal]"
2/22/2016 2:26:48 PMWaiting for metadata
2/22/2016 2:26:48 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: drivers [glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=web_vol glusterfs.servers=glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=web_vol --driver-opts=glusterfs.servers=glusterfs]"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30" pkg=daemon
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.servers:glusterfs glusterfs.defaultvolumepool:web_vol] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30"
2/22/2016 2:26:48 PMtime="2016-02-22T13:26:48Z" level=debug msg="Volume web_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol, with option [-t glusterfs]" pkg=util
2/22/2016 2:26:49 PMtime="2016-02-22T13:26:49Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:50 PMtime="2016-02-22T13:26:50Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:51 PMtime="2016-02-22T13:26:51Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:52 PMtime="2016-02-22T13:26:52Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:53 PMtime="2016-02-22T13:26:53Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:54 PMtime="2016-02-22T13:26:54Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:55 PMtime="2016-02-22T13:26:55Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:56 PMtime="2016-02-22T13:26:56Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:57 PMtime="2016-02-22T13:26:57Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:58 PMtime="2016-02-22T13:26:58Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:26:59 PMtime="2016-02-22T13:26:59Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:00 PMtime="2016-02-22T13:27:00Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:01 PMtime="2016-02-22T13:27:01Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:02 PMtime="2016-02-22T13:27:02Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:03 PMtime="2016-02-22T13:27:03Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=debug msg="Cleaning up environment..." pkg=daemon
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/web_vol /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/22/2016 2:27:04 PM{
2/22/2016 2:27:04 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/web_vol /var/lib/rancher/convoy/convoy-gluster-fc47dbdb-4a9c-4475-84e1-da035f0ede30/glusterfs/mounts/web_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
2/22/2016 2:27:04 PM}
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=info msg="convoy exited with error: exit status 1"
2/22/2016 2:27:04 PMtime="2016-02-22T13:27:04Z" level=info msg=Exiting.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 29 (7 by maintainers)

Most upvoted comments

We are experiencing the same problem. It seems like glusterfs is not correctly starting up (though it shows up as green). convoy-gluster fails terribly at starting up with the message further below.

We used the older versions of glusterfs before. We removed them completely and switched to the new version. Maybe that triggers the problem?

We are willing to try out anything you suggest and report back.

Info to the hosts: They are within the same data center (two of them even in the same rack). And the machines are 4core machines with 64gb ram. Pretty powerful machines. Network is really fast as well. Ping between the servers through the VPN is < 1ms.

On the glusterfs_glusterfs-server_1 1-3 we see the following (the show up green).

3/17/2016 1:33:00 PMWaiting for all service containers to start...
3/17/2016 1:33:01 PMContainers are starting...
3/17/2016 1:33:01 PMWaiting for Gluster Daemons to come up
3/17/2016 1:38:48 PMWaiting for all service containers to start...
3/17/2016 1:38:49 PMContainers are starting...
3/17/2016 1:38:49 PMWaiting for Gluster Daemons to come up
3/17/2016 1:55:36 PMgluster peer probe 10.42.162.226
3/17/2016 1:55:36 PMConnection failed. Please check if gluster daemon is operational.
3/17/2016 1:56:37 PMWaiting for all service containers to start...
3/17/2016 1:56:38 PMContainers are starting...
3/17/2016 1:56:38 PMWaiting for Gluster Daemons to come up

Error on convoy-gluster container convoy-gluster_convoy-gluster_1

3/17/2016 1:56:28 PMWaiting for metadata.
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Execing [/usr/bin/nsenter --mount=/proc/645/ns/mnt -F -- /var/lib/docker/aufs/mnt/d08f1b25cb1d7d119db93d942329f25599f5981b2e6ed65c5b4b7b27f48e424a/var/lib/rancher/convoy-agent/share-mnt --stage2 /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54 -- /launch volume-agent-glusterfs-internal]"
3/17/2016 1:56:28 PMWaiting for metadata
3/17/2016 1:56:28 PMRegistering convoy socket at /var/run/convoy-convoy-gluster.sock
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Listening for health checks on 0.0.0.0:10241/healthcheck"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: root /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: drivers [glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Got: driver-opts [glusterfs.defaultvolumepool=integral_vol glusterfs.servers=glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=info msg="Launching convoy with args: [--socket=/host/var/run/convoy-convoy-gluster.sock daemon --root=/var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54 --drivers=glusterfs --driver-opts=glusterfs.defaultvolumepool=integral_vol --driver-opts=glusterfs.servers=glusterfs]"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg="Creating config at /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54" pkg=daemon
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg= driver=glusterfs driver_opts=map[glusterfs.defaultvolumepool:integral_vol glusterfs.servers:glusterfs] event=init pkg=daemon reason=prepare root="/var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54"
3/17/2016 1:56:28 PMtime="2016-03-17T12:56:28Z" level=debug msg="Volume integral_vol is being mounted it to /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol, with option [-t glusterfs]" pkg=util
3/17/2016 1:56:29 PMtime="2016-03-17T12:56:29Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:30 PMtime="2016-03-17T12:56:30Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:31 PMtime="2016-03-17T12:56:31Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:32 PMtime="2016-03-17T12:56:32Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:33 PMtime="2016-03-17T12:56:33Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:34 PMtime="2016-03-17T12:56:34Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:35 PMtime="2016-03-17T12:56:35Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:36 PMtime="2016-03-17T12:56:36Z" level=error msg="Get http:///host/var/run/convoy-convoy-gluster.sock/v1/volumes/list: dial unix /host/var/run/convoy-convoy-gluster.sock: connection refused"
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=debug msg="Cleaning up environment..." pkg=daemon
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=error msg="Failed to execute: mount [-t glusterfs glusterfs:/integral_vol /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/17/2016 1:56:37 PM{
3/17/2016 1:56:37 PM    "Error": "Failed to execute: mount [-t glusterfs glusterfs:/integral_vol /var/lib/rancher/convoy/convoy-gluster-75e26d85-7e46-402b-a0ed-ce357900bc54/glusterfs/mounts/integral_vol], output Mount failed. Please check the log file for more details.\n, error exit status 1"
3/17/2016 1:56:37 PM}
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=info msg="convoy exited with error: exit status 1"
3/17/2016 1:56:37 PMtime="2016-03-17T12:56:37Z" level=info msg=Exiting.

And some screenshots here

Please note that we have removed GlusterFS and Convoy Gluster from the catalog as users were expecting a robust tool as an alternative persistent storage for Docker volumes. However, due to lack of active maintenance, we cannot recommend this solution going forward.

Instead, we recommend and certify Convoy NFS, which is actively maintained by Rancher. As a user, you can get GlusterFS support directly from Red Hat and use it in Rancher using Rancher’s NFS plugin.

Due to these changes regarding glusterfs and convoy gluster, we will not be addressing this bug for 1.2.0