addons: MariaDB does not recover from HASS.io backup

Home Assistant release with the issue:

Home Assistant 0.110.2

Last working Home Assistant release (if known): Unknown (but it definitely worked earlier).

Operating environment (Hass.io/Docker/Windows/etc.):

Hass.io/Docker/Supervised

Component/platform:

MariaDB add-on used as record/history component.

Description of problem: After recovery from Hass.io snapshot MariaDB doesn’t work any more. The only way to force it to work is to uninstall the add-on and install it again with the same settings. Of course, in this case you lose your DB and history.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

recorder: 
  db_url: mysql://hass:password@192.168.xxx.xxx/homeassistant?charset=utf8
  purge_interval: 7
  purge_keep_days: 7

Traceback (if applicable):

Additional information: It worked fine on one of earlier version of hass.io. There are other issues containing relevant information that were closed without a solution: https://github.com/home-assistant/core/issues/26152 https://github.com/home-assistant/hassio-addons/issues/667

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 7
  • Comments: 48 (9 by maintainers)

Commits related to this issue

Most upvoted comments

FWIW, on my test instance a recovery of a snapshot taken while MariaDB running seems to have worked. I did make sure that data got written during snapshot, so it should represent a real world situation.

The latest MariaDB Add-on 2.4.0 does essentially what MariaDB suggest when taking a file system level snapshot (and I outlined in https://github.com/home-assistant/addons/issues/1353#issuecomment-660351657). So snapshots taken with this version of the add-on should not lead to unrecoverable database snapshots anymore.

You should see something like this in the add-on log during snapshot:

[11:40:59] INFO: Lock tables using mariadb client...
[11:40:59] INFO: MariaDB tables locked.
[11:48:20] INFO: MariaDB tables unlocked.

What I currently use is the table lock approach suggested by the MariaDB documentation. Basically, SSH into the underlaying OS, and do the following:

$ docker exec addon_core_mariadb
bash-5.0# mysql
...
MariaDB [(none)]> FLUSH TABLES WITH READ LOCK;
Query OK, 0 rows affected (0.024 sec)

MariaDB [(none)]> 

Leave the shell/mysql client open, and do the backup, then, after the backup finishes:

MariaDB [(none)]> UNLOCK TABLES;

It seems that with this method, I did not even lose an event! The database inserts got queued, and executed after I ran UNLOCK TABLES;. There is quite a big difference in creation date, as the later event got stored only after the backup.

This snapshot of the events table is reverse sorted, so the first row is the newer row… One can see that the backup took 27 minutes (actually, it was faster, I just unlocked the table only after 27 minutes). At 2020-07-17 21:38:19 loooots of events got stored at once:

event_id | event_type    | event_data                                            | origin | time_fired          | created             | context_id      | context_user_id
32856177 | state_changed | {"entity_id": "sensor.openwrt_ethernet_out", "old_... | LOCAL  | 2020-07-17 21:11:11 | 2020-07-17 21:38:19 | be274bba62de484192cc0dc94deae128 | NULL
32856176 | state_changed | {"entity_id": "sensor.openwrt_ethernet_in", "old_s... | LOCAL  | 2020-07-17 21:11:11 | 2020-07-17 21:11:11 | 4bb84f20864649f1a04e5af947ed9a9e | NULL

I think having the Supervisor executing this commands before/after taking the snapshot of the database would be the ideal solution.

At a high level, a potential fix could look like:

  1. Snapshot is initiated
  2. Snapshot detects mariadb add-on
  3. Snapshot executes commands to make a recoverable mariadb backup
mariabackup --backup \
   --target-dir=/var/mariadb/backup/ \
   --user=homeassistant --password=mypassword

mariabackup --prepare \
   --target-dir=/var/mariadb/backup/
  1. /var/mariadb/backup/ contents is included in the snapshot

Questions:

  1. How do the mariabackup commands get executed mariadb container?
  2. How does the contents of /var/mariadb/backup/ get into the snapshot?

Additional consideration would also be needed during the restore process to restore the files to the mariadb data directory. Thoughts on if this implementation is feasible?

Alternatives:

As mentioned earlier in the thread FLUSH TABLES WITH READ LOCK will make the data files consistent. This lock would need to be maintained until the backup completed, and then unlocked. Writes would be suspended during the backup operation, but if the lock is able to be acquired quickly, and the files written to the snapshot quickly, then the impacts here might be acceptable. I’m not sure that we can assume that these operations will complete quickly in all scenarios though.

Overall, option 1 would probably be more robust, but would add complexity since it doesn’t fit as neatly into the existing snapshot framework.

This is still an issue for me. Would be great if we could get some kind of fix for this…

@basnijholt For me it is working if the add-on is stopped before backup and restarted after that.

I am still affected by this.

I share my implementation of the work around. I manage my snapshots with a Shell script in Bash using the addon “SSH & Web terminal”. With this addon we can call a service from an automation to launch a script which can access all the environment of the addon (namely ha and hass-cli commands):

Thanks @agners for this idea. I think this should work fine.

Supervisor should care about that. So if a snapshot gets triggered and MaraDB is configured then it should execute this commands before/after snapshot.

Any ideas how to achieve this?

@frenck I guess it needs an additional hook or something from the add on system would be needed. E.g. a command which would be executed in the container context before/(and after) doing the backup. Then the FLUSH TABLES WITH READ LOCK command could be specified for the MariaDB add-on.