velero: Can't restore restic volumes

What steps did you take and what happened: I created a backup that included some PVCs backed up using restic (with the backup.velero.io/backup-volumes annotation). The backup completed successfully. I was using S3. These PVCs are bound to normal EBS PV’s, so I know I can use snapshots, but I was trying out restic.

I then deleted the cluster and set up a new kops cluster, installed velero using the same manifests and created a restore job.

The restore job then gets stuck in pending.

What did you expect to happen:

Everything to restore.

The output of the following commands will help us better understand what’s going on:

  • kubectl logs deployment/velero -n velero
<snip>
time="2019-03-14T01:55:43Z" level=info msg="Restoring cluster level resource 'persistentvolumes' from: /tmp/725321603/resources/persistentvolumes/cluster" backup=bts3 logSource="pkg/restore/restore.go:696" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="Getting client for /v1, Kind=PersistentVolume" backup=bts3 logSource="pkg/restore/restore.go:754" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-50d4544c-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="Attempting to restore PersistentVolume: pvc-50d4544c-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:43Z" level=info msg="error restoring pvc-50d4544c-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="successfully restored persistent volume from snapshot" backup=bts3 logSource="pkg/restore/restore.go:1166" persistentVolume=pvc-510b74ed-4526-11e9-9f1f-0aa34d933732 providerSnapshotID=snap-094014b90adb832dc restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-510b74ed-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="No snapshot found for persistent volume" backup=bts3 logSource="pkg/restore/restore.go:1148" persistentVolume=pvc-54284933-4526-11e9-9f1f-0aa34d933732 restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolume: pvc-54284933-4526-11e9-9f1f-0aa34d933732" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="error restoring pvc-54284933-4526-11e9-9f1f-0aa34d933732: <nil>" backup=bts3 logSource="pkg/restore/restore.go:964" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Restoring resource 'persistentvolumeclaims' into namespace 'sgs' from: /tmp/725321603/resources/persistentvolumeclaims/namespaces/sgs" backup=bts3 logSource="pkg/restore/restore.go:694" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Getting client for /v1, Kind=PersistentVolumeClaim" backup=bts3 logSource="pkg/restore/restore.go:754" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-0" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-1" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: etcd-pv-2" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: postgres-pv-0" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:55:44Z" level=info msg="Attempting to restore PersistentVolumeClaim: postgres-pv-1" backup=bts3 logSource="pkg/restore/restore.go:903" restore=velero/bts3-20190313185509
time="2019-03-14T01:56:43Z" level=warning msg="Timeout reached waiting for persistent volume pvc-50d4544c-4526-11e9-9f1f-0aa34d933732 to become ready" backup=bts3 logSource="pkg/restore/restore.go:830" restore=velero/bts3-20190313185509

<snip>
  • velero backup describe <backupname> or kubectl get backup/<backupname> -n velero -o yaml
Name:         bts3
Namespace:    velero
Labels:       velero.io/storage-location=default
Annotations:  <none>

Phase:  Completed

Namespaces:
  Included:  *
  Excluded:  <none>

Resources:
  Included:        *
  Excluded:        <none>
  Cluster-scoped:  auto

Label selector:  <none>

Storage Location:  default

Snapshot PVs:  auto

TTL:  720h0m0s

Hooks:  <none>

Backup Format Version:  1

Started:    2019-03-12 18:48:59 -0700 PDT
Completed:  2019-03-12 18:51:17 -0700 PDT

Expiration:  2019-04-11 18:48:59 -0700 PDT

Validation errors:  <none>

Persistent Volumes:
  pvc-510b74ed-4526-11e9-9f1f-0aa34d933732:
    Snapshot ID:        snap-094014b90adb832dc
    Type:               gp2
    Availability Zone:  us-west-2c
    IOPS:               <N/A>

Restic Backups:
  New:
    sgs/etcd-0-79894c7d66-jv9hz: data, data, data
    sgs/etcd-1-74d4fdd498-9wdxw: data, data, data
    sgs/etcd-2-bbdb44bdd-xlb27: data, data, data
    sgs/postgres-0-fd9f5dfd4-fql6b: data, data
    sgs/postgres-0-fd9f5dfd4-h6mks: data

Anything else you would like to add:

The restore also hangs, so there are no logs for the restore, only what I extract from the velero pod.

Also, this is the output of kubectl get pvc:

sgs         etcd-pv-0       Lost     pvc-53b5ffce-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         etcd-pv-1       Lost     pvc-53f0ace1-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         etcd-pv-2       Lost     pvc-54284933-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         postgres-pv-0   Lost     pvc-50d4544c-4526-11e9-9f1f-0aa34d933732   0                         gp2-us-west-2c   8m
sgs         postgres-pv-1   Bound    pvc-510b74ed-4526-11e9-9f1f-0aa34d933732   100Gi      RWO            gp2-us-west-2c   8m

postgres-pv-1 is a volume I forgot the annotation on so it used normal snapshots. The other PVCs I set the annotation on.

Environment:

  • Velero version (use velero version): 0.11.0
  • Kubernetes version (use kubectl version): 1.11
  • Kubernetes installer & version: KOPS 1.11
  • Cloud provider or hardware configuration: AWS
  • OS (e.g. from /etc/os-release):

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 28 (10 by maintainers)

Most upvoted comments

I got into a similar situation where I was trying to do restores that included restic volumes. All of the restores were getting stuck in STATUS=New. The way I resolved it was to delete the velero pod, so the velero deployment would recreate it. Then, the newly created velero pod started picking up the restores and their STATUS changed to InProgress.

Great. Thank you for your help. I’ll follow #1151 and wait for a resolution. The restic backup is ideal because I can restore in a different region (AWS main to AWS govcloud) but snapshots will do for now.