velero: Some Restic Backups timeout

What steps did you take and what happened: I have velero 1.2 and restic configured to run backups. I recently switched S3 providers (to minio) and decided to start over with new backups. I ran velero backup delete --all and waited for the backups to be deleted. I removed all the resticrepositories, and podvolumebackups. I changed the backupstoragelocation and cloud-credentials for my new S3 provider. I then ran the same velero backup create full command I usually do.

What did you expect to happen: The cluster and volumes should get backed up. Some of them were (and are on my S3), and others not so much (although the restic repos did get pushed to S3).

The output of the following commands will help us better understand what’s going on:

Name:         full
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>
API Version:  velero.io/v1
Kind:         Backup
Metadata:
  Creation Timestamp:  2020-01-28T20:38:15Z
  Generation:          3
  Resource Version:    19641623
  Self Link:           /apis/velero.io/v1/namespaces/velero/backups/full
  UID:                 741c8130-0be3-4b03-9a48-968dd69d928f
Spec:
  Hooks:
  Included Namespaces:
    *
  Storage Location:  default
  Ttl:               720h0m0s
Status:
  Completion Timestamp:  2020-01-28T21:38:28Z
  Errors:                3
  Expiration:            2020-02-27T20:38:15Z
  Phase:                 PartiallyFailed
  Start Timestamp:       2020-01-28T20:38:15Z
  Version:               1
Events:                  <none>

Anything else you would like to add:

  • There’s nothing at all in the logs of the restic pods on the nodes where the volumes are mounted. It’s like they’re not seeing the PodVolumeBackup at all.
  • Only some volumes aren’t backed up. at least two of them work correctly.

Good podvolumebackup:

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: gitlab-do
  creationTimestamp: "2020-01-28T20:38:20Z"
  generateName: full-
  generation: 16
  labels:
    velero.io/backup-name: full
    velero.io/backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
    velero.io/pvc-uid: fc0a9771-9de8-4c17-b5ce-7a220c005fea
  name: full-6svfm
  namespace: velero
  ownerReferences:
  - apiVersion: velero.io/v1
    controller: true
    kind: Backup
    name: full
    uid: 741c8130-0be3-4b03-9a48-968dd69d928f
  resourceVersion: "19626082"
  selfLink: /apis/velero.io/v1/namespaces/velero/podvolumebackups/full-6svfm
  uid: 806804fc-a054-4358-91bd-0878713b6969
spec:
  backupStorageLocation: default
  node: k8s-htz-worker-04
  pod:
    kind: Pod
    name: gitlab-596c566875-t7brs
    namespace: gitlab
    uid: 066a2068-7ba7-496a-8d12-a34e194f8f9c
  repoIdentifier: s3:https://some-s3-server/some-bucket/restic/gitlab
  tags:
    backup: full
    backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
    ns: gitlab
    pod: gitlab-596c566875-t7brs
    pod-uid: 066a2068-7ba7-496a-8d12-a34e194f8f9c
    pvc-uid: fc0a9771-9de8-4c17-b5ce-7a220c005fea
    volume: gitlab-do
  volume: gitlab-do
status:
  completionTimestamp: "2020-01-28T20:40:39Z"
  path: /host_pods/066a2068-7ba7-496a-8d12-a34e194f8f9c/volumes/kubernetes.io~csi/pvc-fc0a9771-9de8-4c17-b5ce-7a220c005fea/mount
  phase: Completed
  progress:
    bytesDone: 652131419
    totalBytes: 652131419
  snapshotID: 99b42b1d
  startTimestamp: "2020-01-28T20:38:20Z"

Bad podvolumebackup:

apiVersion: velero.io/v1
kind: PodVolumeBackup
metadata:
  annotations:
    velero.io/pvc-name: rainloop
  creationTimestamp: "2020-01-28T21:12:58Z"
  generateName: full-
  generation: 2
  labels:
    velero.io/backup-name: full
    velero.io/backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
    velero.io/pvc-uid: 5418f32e-03cb-40e5-8870-943b9b5bad00
  name: full-2rjr5
  namespace: velero
  ownerReferences:
  - apiVersion: velero.io/v1
    controller: true
    kind: Backup
    name: full
    uid: 741c8130-0be3-4b03-9a48-968dd69d928f
  resourceVersion: "19634604"
  selfLink: /apis/velero.io/v1/namespaces/velero/podvolumebackups/full-2rjr5
  uid: 7c4c9f3c-6226-4e9a-b862-a17edf6df65a
spec:
  backupStorageLocation: default
  node: k8s-htz-worker-03
  pod:
    kind: Pod
    name: rainloop-68fd4ffb69-q5qzs
    namespace: mail
    uid: 03407647-5f4d-4bd1-8f12-b71e1daa2c6c
  repoIdentifier: s3:https://some-s3-server/some-bucket/restic/mail
  tags:
    backup: full
    backup-uid: 741c8130-0be3-4b03-9a48-968dd69d928f
    ns: mail
    pod: rainloop-68fd4ffb69-q5qzs
    pod-uid: 03407647-5f4d-4bd1-8f12-b71e1daa2c6c
    pvc-uid: 5418f32e-03cb-40e5-8870-943b9b5bad00
    volume: rainloop
  volume: rainloop
status:
  phase: InProgress
  progress: {}
  startTimestamp: "2020-01-28T21:12:58Z"

Environment:

  • Velero version (use velero version): v1.2.0
  • Velero features (use velero client config get features): features: <NOT SET>
  • Kubernetes version (use kubectl version):
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.2", GitCommit:"59603c6e503c87169aea6106f57b9f242f64df89", GitTreeState:"clean", BuildDate:"2020-01-23T14:21:54Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.1", GitCommit:"d224476cd0730baca2b6e357d144171ed74192d6", GitTreeState:"clean", BuildDate:"2020-01-14T20:56:50Z", GoVersion:"go1.13.5", Compiler:"gc", Platform:"linux/amd64"}
  • Kubernetes installer & version: ???
  • Cloud provider or hardware configuration: Running kubernetes on hetzner. Set up the cluster with their cli tool.
  • OS (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.3 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.3 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

FWIW I’ve run several dozen backups with this fix in place and experienced no more errors.

yep, I can see that code would definitely panic if restic’s stdout is empty when it’s called. we’ll work on a fix here.

thanks for including all the detailed info!

Update: a subsequent backup attempt generated a panic.

time="2020-01-29T05:16:06Z" level=info msg="Skipping snapshot of persistent volume because volume is being backed up with restic." backup=velero/full2 group=v1 logSource="pkg/backup/item_backupper.go:413" name=pvc-5418f32e-03cb-40e5-8870-943b9b5bad00 namespace= persistentVolume=pvc-5418f32e-03cb-40e5-8870-943b9b5bad00 resource=persistentvolumes
time="2020-01-29T05:16:09Z" level=info msg="1 errors encountered backup up item" backup=velero/full2 group=v1 logSource="pkg/backup/resource_backupper.go:284" name=rainloop-68fd4ffb69-q5qzs namespace= resource=pods
time="2020-01-29T05:16:09Z" level=error msg="Error backing up item" backup=velero/full2 error="pod volume backup failed: error running restic backup, stderr=panic: runtime error: invalid memory address or nil pointer dereference\n[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7b82c8]\n\ngoroutine 47 [running]:\ngithub.com/restic/restic/internal/repository.(*Index).ID(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)\n\t/restic/internal/repository/index.go:377 +0x38\ngithub.com/restic/restic/internal/repository.(*Repository).LoadIndex.func5(0x8, 0xe6c288)\n\t/restic/internal/repository/repository.go:467 +0xce\ngolang.org/x/sync/errgroup.(*Group).Go.func1(0xc000554990, 0xc0005c3300)\n\t/restic/vendor/golang.org/x/sync/errgroup/errgroup.go:57 +0x57\ncreated by golang.org/x/sync/errgroup.(*Group).Go\n\t/restic/vendor/golang.org/x/sync/errgroup/errgroup.go:54 +0x66\n: unable to find summary in restic backup command output" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:182" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" group=v1 logSource="pkg/backup/resource_backupper.go:288" name=rainloop-68fd4ffb69-q5qzs namespace= resource=pods

Full log: https://cloud.koehn.com/s/3d7SS56cs5ywFdE