velero: Some Restic Backups timeout
What steps did you take and what happened:
I have velero 1.2 and restic configured to run backups. I recently switched S3 providers (to minio) and decided to start over with new backups. I ran velero backup delete --all
and waited for the backups to be deleted. I removed all the resticrepositories
, and podvolumebackups
. I changed the backupstoragelocation
and cloud-credentials
for my new S3 provider. I then ran the same velero backup create full
command I usually do.
What did you expect to happen: The cluster and volumes should get backed up. Some of them were (and are on my S3), and others not so much (although the restic repos did get pushed to S3).
The output of the following commands will help us better understand what’s going on:
-
kubectl logs deployment/velero -n velero
https://cloud.koehn.com/s/gaXaKigGmJbQYt6 -
velero backup describe <backupname>
Name: full
Namespace: velero
Labels: velero.io/storage-location=default
Annotations: <none>
API Version: velero.io/v1
Kind: Backup
Metadata:
Creation Timestamp: 2020-01-28T20:38:15Z
Generation: 3
Resource Version: 19641623
Self Link: /apis/velero.io/v1/namespaces/velero/backups/full
UID: 741c8130-0be3-4b03-9a48-968dd69d928f
Spec:
Hooks:
Included Namespaces:
*
Storage Location: default
Ttl: 720h0m0s
Status:
Completion Timestamp: 2020-01-28T21:38:28Z
Errors: 3
Expiration: 2020-02-27T20:38:15Z
Phase: PartiallyFailed
Start Timestamp: 2020-01-28T20:38:15Z
Version: 1
Events: <none>
velero backup logs <backupname>
https://cloud.koehn.com/s/3nCErAGpoKdeRcQ
Anything else you would like to add:
- There’s nothing at all in the logs of the restic pods on the nodes where the volumes are mounted. It’s like they’re not seeing the
PodVolumeBackup
at all. - Only some volumes aren’t backed up. at least two of them work correctly.
Good podvolumebackup:
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
annotations:
velero.io/pvc-name: gitlab-do
creationTimestamp: "2020-01-28T20:38:20Z"
generateName: full-
generation: 16
labels:
velero.io/backup-name: full
velero.io/backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
velero.io/pvc-uid: fc0a9771-9de8-4c17-b5ce-7a220c005fea
name: full-6svfm
namespace: velero
ownerReferences:
- apiVersion: velero.io/v1
controller: true
kind: Backup
name: full
uid: 741c8130-0be3-4b03-9a48-968dd69d928f
resourceVersion: "19626082"
selfLink: /apis/velero.io/v1/namespaces/velero/podvolumebackups/full-6svfm
uid: 806804fc-a054-4358-91bd-0878713b6969
spec:
backupStorageLocation: default
node: k8s-htz-worker-04
pod:
kind: Pod
name: gitlab-596c566875-t7brs
namespace: gitlab
uid: 066a2068-7ba7-496a-8d12-a34e194f8f9c
repoIdentifier: s3:https://some-s3-server/some-bucket/restic/gitlab
tags:
backup: full
backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
ns: gitlab
pod: gitlab-596c566875-t7brs
pod-uid: 066a2068-7ba7-496a-8d12-a34e194f8f9c
pvc-uid: fc0a9771-9de8-4c17-b5ce-7a220c005fea
volume: gitlab-do
volume: gitlab-do
status:
completionTimestamp: "2020-01-28T20:40:39Z"
path: /host_pods/066a2068-7ba7-496a-8d12-a34e194f8f9c/volumes/kubernetes.io~csi/pvc-fc0a9771-9de8-4c17-b5ce-7a220c005fea/mount
phase: Completed
progress:
bytesDone: 652131419
totalBytes: 652131419
snapshotID: 99b42b1d
startTimestamp: "2020-01-28T20:38:20Z"
Bad podvolumebackup:
apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
annotations:
velero.io/pvc-name: rainloop
creationTimestamp: "2020-01-28T21:12:58Z"
generateName: full-
generation: 2
labels:
velero.io/backup-name: full
velero.io/backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
velero.io/pvc-uid: 5418f32e-03cb-40e5-8870-943b9b5bad00
name: full-2rjr5
namespace: velero
ownerReferences:
- apiVersion: velero.io/v1
controller: true
kind: Backup
name: full
uid: 741c8130-0be3-4b03-9a48-968dd69d928f
resourceVersion: "19634604"
selfLink: /apis/velero.io/v1/namespaces/velero/podvolumebackups/full-2rjr5
uid: 7c4c9f3c-6226-4e9a-b862-a17edf6df65a
spec:
backupStorageLocation: default
node: k8s-htz-worker-03
pod:
kind: Pod
name: rainloop-68fd4ffb69-q5qzs
namespace: mail
uid: 03407647-5f4d-4bd1-8f12-b71e1daa2c6c
repoIdentifier: s3:https://some-s3-server/some-bucket/restic/mail
tags:
backup: full
backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
ns: mail
pod: rainloop-68fd4ffb69-q5qzs
pod-uid: 03407647-5f4d-4bd1-8f12-b71e1daa2c6c
pvc-uid: 5418f32e-03cb-40e5-8870-943b9b5bad00
volume: rainloop
volume: rainloop
status:
phase: InProgress
progress: {}
startTimestamp: "2020-01-28T21:12:58Z"
Environment:
- Velero version (use
velero version
): v1.2.0 - Velero features (use
velero client config get features
):features: <NOT SET>
- Kubernetes version (use
kubectl version
):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-23T14:21:54Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T20:56:50Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
- Kubernetes installer & version: ???
- Cloud provider or hardware configuration: Running kubernetes on hetzner. Set up the cluster with their cli tool.
- OS (e.g. from
/etc/os-release
):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 18 (18 by maintainers)
Commits related to this issue
- fix for a bug #2226 I think — committed to koehn/velero by deleted user 4 years ago
- fix for a bug #2226 I think Signed-off-by: Brad Koehn <brad@koehn.com> — committed to koehn/velero by deleted user 4 years ago
FWIW I’ve run several dozen backups with this fix in place and experienced no more errors.
yep, I can see that code would definitely panic if restic’s stdout is empty when it’s called. we’ll work on a fix here.
thanks for including all the detailed info!
Update: a subsequent backup attempt generated a panic.
Full log: https://cloud.koehn.com/s/3d7SS56cs5ywFdE