influxdb: Backup failure: Download shard n failed copy backup to file

Steps to reproduce: Execute a backup command like this: influxd backup -database mydb -portable -since 2018-01-01T00:00:00Z backup_$(date +%Y%m%d_%H%M%S)

Expected behavior: Get a backup written to files.

Actual behavior: Fails to generate a backup. Console output is as follows.

2020/02/05 14:40:47 backing up db=mydb
2020/02/05 14:40:47 backing up db=mydb rp=my_rp shard=122 to backup_20200205_144047/mydb.my_rp.00122.00 since 2018-01-01T00:00:00Z
2020/02/05 14:40:47 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (0)...
2020/02/05 14:40:49 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (1)...
2020/02/05 14:40:52 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (2)...
2020/02/05 14:40:54 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (3)...
2020/02/05 14:40:56 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (4)...
2020/02/05 14:40:58 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2s and retrying (5)...
2020/02/05 14:41:01 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 3.01s and retrying (6)...
2020/02/05 14:41:04 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 11.441s and retrying (7)...
2020/02/05 14:41:15 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 43.477s and retrying (8)...
2020/02/05 14:41:59 Download shard 122 failed copy backup to file: err=<nil>, n=0.  Waiting 2m45.216s and retrying (9)...
2020/02/05 14:44:44 error (copy backup to file: err=<nil>, n=0) when backing up db: mydb, rp my_rp, shard 122. continuing backup on remaining shards
2020/02/05 14:44:44 backup failed: copy backup to file: err=<nil>, n=0
backup: copy backup to file: err=<nil>, n=0 

Only once have I gotten a backup, and that was when the database was newly-created.

  • The directory that contains the shard noted in the output above contained hundreds of empty temp directories. Deleting those directories didn’t change anything with the backup failure.
  • Of the 132 shards, one has 8 tsm files, the problem shard has one tsm file and the others have one tsm file.
  • Running influx_inspect commands verify and inspect show no issues.

Environment info:

  • System info: Linux 4.15.0-1066-azure x86_64
  • InfluxDB version: InfluxDB v1.7.0 (git: 1.7 dac4c6f571662c63dc0d73346787b8c7f113222a)
  • Other relevant environment details: Influx instance is running on a pod in a AKS cluster. Database files are on a mounted Azure File share.

Config: Non-default configs are reporting-disabled = true bind-address = "0.0.0.0:8088" auth-enabled = true

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (7 by maintainers)

Commits related to this issue

Most upvoted comments

It seems to be related to #9923: if there was no free space on the device, then InfluxDB stops saving data to disk for some metrics and will not restore even if space was freed. In that state, the backup functionality will not work until the service restart. After the restart, some data become lost, but the backup functionality starts working again. At least it was so in my case.