mariadb-operator: [Bug] Galera complete cluster failure recovery

Yesterday evening I’ve deployed mariadb-operator for a development Galera cluster. I then started importing all table structures, which went fine. After that I tried to import some big sql data files in parallel over the course of the night. Biggest one being a 13G file. All files were originally created using mysqldump from a production database. Unfortunately when I woke up, I saw that all 3 of galera’s instances had crashed and that the pods were in a CrashLoop.

Describe the bug

Unfortunately, this cluster does not have additional logging yet. So figuring out why the first crash happened is not an option anymore. What is an option, is to bootstrap the cluster and resume operatings from the last known good state. Unfortunately, mariadb-operator does not seem to provide a way to do this. All tries to start the servers end in the following logs:

2023-08-02 07:09:00+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.0.2+maria~ubu2204 started.
2023-08-02 07:09:00+00:00 [Note] [Entrypoint]: Switching to dedicated user 'mysql'
2023-08-02 07:09:00+00:00 [Note] [Entrypoint]: Entrypoint script for MariaDB Server 1:11.0.2+maria~ubu2204 started.
2023-08-02 07:09:01+00:00 [Note] [Entrypoint]: MariaDB upgrade not required
2023-08-02  7:09:01 0 [Note] Starting MariaDB 11.0.2-MariaDB-1:11.0.2+maria~ubu2204-log source revision 0005f2f06c8e1aea4915887decad67885108a929 as process 1
2023-08-02  7:09:01 0 [Note] WSREP: Loading provider /usr/lib/galera/libgalera_smm.so initial position: 00000000-0000-0000-0000-000000000000:-1
2023-08-02  7:09:01 0 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/galera/libgalera_smm.so'
2023-08-02  7:09:01 0 [Note] WSREP: wsrep_load(): Galera 26.4.14(r06a0c285) by Codership Oy <info@codership.com> loaded successfully.
2023-08-02  7:09:01 0 [Note] WSREP: Initializing allowlist service v1
2023-08-02  7:09:01 0 [Note] WSREP: Initializing event service v1
2023-08-02  7:09:01 0 [Note] WSREP: CRC-32C: using 64-bit x86 acceleration.
2023-08-02  7:09:01 0 [Note] WSREP: Found saved state: 00000000-0000-0000-0000-000000000000:-1, safe_to_bootstrap: 0
2023-08-02  7:09:01 0 [Note] WSREP: GCache DEBUG: opened preamble:
Version: 2
UUID: 1fae98e4-30a7-11ee-b48b-1a696f898dba
Seqno: 4930 - 4963
Offset: 1280
Synced: 1
2023-08-02  7:09:01 0 [Note] WSREP: Recovering GCache ring buffer: version: 2, UUID: 1fae98e4-30a7-11ee-b48b-1a696f898dba, offset: 1280
2023-08-02  7:09:01 0 [Note] WSREP: GCache::RingBuffer initial scan...  0.0% (        0/134217752 bytes) complete.
2023-08-02  7:09:01 0 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (134217752/134217752 bytes) complete.
2023-08-02  7:09:01 0 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 4930-4963
2023-08-02  7:09:01 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...  0.0% (       0/56867952 bytes) complete.
2023-08-02  7:09:01 0 [Note] WSREP: Recovering GCache ring buffer: found 6/40 locked buffers
2023-08-02  7:09:01 0 [Note] WSREP: Recovering GCache ring buffer: free space: 79235968/134217728
2023-08-02  7:09:01 0 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (56867952/56867952 bytes) complete.
2023-08-02  7:09:01 0 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = mariadb-galera-dev-0.mariadb-galera-dev-internal.mariadb-operator.sal; base_port = 4567; cert.log_conflicts = no; cert.optimistic_pa = yes; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evsk_period = PT0.5S; evs.inactive_timeout = PT15S; evs.join_retrans_period = PT1S; evs.max_install_timeouts = 3; evs.send_window = 4; evs.stats_report_period = PT1M; evout = PT5S; evs.user_send_window = 2; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.keep_plaintext_size = 128M; gc= 0; gcache.name = galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 128M; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fgcs.fc_master_slave = no; gcs.fc_single_primary = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807;
2023-08-02  7:09:01 0 [Note] WSREP: Start replication
2023-08-02  7:09:01 0 [Note] WSREP: Connecting with bootstrap option: 0
2023-08-02  7:09:01 0 [Note] WSREP: Setting GCS initial position to 00000000-0000-0000-0000-000000000000:-1
2023-08-02  7:09:01 0 [Note] WSREP: Using CRC-32C for message checksums.
2023-08-02  7:09:01 0 [Note] WSREP: backend: asio
2023-08-02  7:09:01 0 [Note] WSREP: gcomm thread scheduling priority set to other:0
2023-08-02  7:09:01 0 [Note] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
2023-08-02  7:09:01 0 [Note] WSREP: restore pc from disk failed
2023-08-02  7:09:01 0 [Note] WSREP: GMCast version 0
2023-08-02  7:09:01 0 [Note] WSREP: (750c83dc-803e, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
2023-08-02  7:09:01 0 [Note] WSREP: (750c83dc-803e, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
2023-08-02  7:09:01 0 [Note] WSREP: EVS version 1
2023-08-02  7:09:01 0 [Note] WSREP: gcomm: connecting to group 'mariadb-operator', peer 'mariadb-galera-dev-0.mariadb-galera-dev-internal.mariadb-operator.svc.clusterb-galera-dev-1.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local:,mariadb-galera-dev-2.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local:'
2023-08-02  7:09:01 0 [Note] WSREP: (750c83dc-803e, 'tcp://0.0.0.0:4567') Found matching local endpoint for a connection, blacklisting address tcp://10.244.4.10:4567
2023-08-02  7:09:04 0 [Note] WSREP: EVS version upgrade 0 -> 1
2023-08-02  7:09:04 0 [Note] WSREP: PC protocol upgrade 0 -> 1
2023-08-02  7:09:04 0 [Warning] WSREP: no nodes coming from prim view, prim not possible
2023-08-02  7:09:04 0 [Note] WSREP: view(view_id(NON_PRIM,750c83dc-803e,1) memb {
        750c83dc-803e,0
} joined {
} left {
} partitioned {
})
2023-08-02  7:09:04 0 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.50275S), skipping check
2023-08-02  7:09:34 0 [Note] WSREP: PC protocol downgrade 1 -> 0
2023-08-02  7:09:34 0 [Note] WSREP: view((empty))
2023-08-02  7:09:34 0 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at ./gcomm/src/pc.cpp:connect():160
2023-08-02  7:09:34 0 [ERROR] WSREP: ./gcs/src/gcs_core.cpp:gcs_core_open():221: Failed to open backend connection: -110 (Connection timed out)
2023-08-02  7:09:35 0 [ERROR] WSREP: ./gcs/src/gcs.cpp:gcs_open():1669: Failed to open channel 'mariadb-operator' at 'gcomm://mariadb-galera-dev-0.mariadb-galera-dev-db-operator.svc.cluster.local,mariadb-galera-dev-1.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local,mariadb-galera-dev-2.mariadb-galera-dev-internal.marsvc.cluster.local': -110 (Connection timed out)
2023-08-02  7:09:35 0 [ERROR] WSREP: gcs connect failed: Connection timed out
2023-08-02  7:09:35 0 [ERROR] WSREP: wsrep::connect(gcomm://mariadb-galera-dev-0.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local,mariadb-galera-dev-1.mdev-internal.mariadb-operator.svc.cluster.local,mariadb-galera-dev-2.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local) failed: 7
2023-08-02  7:09:35 0 [ERROR] Aborting

Expected behaviour Mariadb-operator could either

provide an automated recovery
provide a manual recovery procedure

Steps to reproduce the bug

Freshly deploy mariadb-operator (v0.19.0) using the following yaml:

apiVersion: mariadb.mmontes.io/v1alpha1
kind: MariaDB
metadata:
  name: mariadb-galera-dev
  namespace: mariadb-operator
spec:
  rootPasswordSecretKeyRef:
    name: mariadb
    key: root-password

  database: mariadb
  username: mariadb
  passwordSecretKeyRef:
    name: mariadb
    key: password

  connection:
    secretName: mariadb-conn
    secretTemplate:
      key: dsn

  image:
    repository: mariadb
    tag: "11.0.2"
    pullPolicy: IfNotPresent

  port: 3306

  replicas: 3

  podSecurityContext:
    runAsUser: 0
    
  galera:
    enabled: true
    sst: mariabackup
    replicaThreads: 1
    agent:
      image:
        repository: ghcr.io/mariadb-operator/agent
        tag: "v0.0.2"
        pullPolicy: IfNotPresent
      port: 5555
      kubernetesAuth:
        enabled: true
        authDelegatorRoleName: mariadb-galera-auth
      gracefulShutdownTimeout: 5s
    recovery:
      enabled: true
      clusterHealthyTimeout: 1m
      clusterBootstrapTimeout: 5m
      podRecoveryTimeout: 3m
      podSyncTimeout: 3m
    initContainer:
      image:
        repository: ghcr.io/mariadb-operator/init
        tag: "v0.0.5"
        pullPolicy: IfNotPresent
    volumeClaimTemplate:
      resources:
        requests:
          storage: 50Mi
      accessModes:
        - ReadWriteOnce

  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - topologyKey: "kubernetes.io/hostname"

  tolerations:
    - key: "mariadb.mmontes.io/ha"
      operator: "Exists"
      effect: "NoSchedule"

  podDisruptionBudget:
    maxUnavailable: 66%

  updateStrategy:
    type: RollingUpdate

  myCnf: |
    [mysqld]
    bind-address=0.0.0.0
    default_storage_engine=InnoDB
    binlog_format=row
    innodb_autoinc_lock_mode=2
    max_allowed_packet=512M
    relay-log = relay-bin
    log-slave-updates = 1
    expire_logs_days = 7
    max_binlog_size = 1000M
    ## Safety
    ##
    skip_host_cache
    skip_name_resolve
    max_allowed_packet=16M
    max_connect_errors=1000000
    sql_mode=STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_AUTO_VALUE_ON_ZERO,NO_ENGINE_SUBSTITUTION,NO_ZERO_DATE,NO_ZERO_IN_DATE,ONLY_FULL_GROUP_BY
    sysdate_is_now=1
    log_bin=mysql-bin
    expire_logs_days=14
    # Disabling for performance per http://severalnines.com/blog/9-tips-going-production-galera-cluster-mysql
    sync_binlog=0
    ## Caches and Limits
    tmp_table_size=32M
    max_heap_table_size=32M
    # Re-enabling as now works with Maria 10.1.2
    query_cache_type=1
    query_cache_limit=4M
    query_cache_size=256M
    max_connections=3000
    thread_handling = pool-of-threads
    thread_stack = 192K
    thread_cache_size = 4
    thread_pool_size = 8
    thread_cache_size=50
    thread_pool_oversubscribe = 3
    open_files_limit=65535
    table_definition_cache=4096
    table_open_cache=4096
    ## InnoDB
    #innodb=FORCE
    innodb_strict_mode=1
    innodb_autoinc_lock_mode=2
    innodb_doublewrite=1
    innodb_flush_method=O_DIRECT_NO_FSYNC
    innodb_log_files_in_group=2
    innodb_log_file_size=1G
    innodb_flush_log_at_trx_commit=2
    innodb_file_per_table=1
    # 80% Memory is default reco.
    # Need to re-evaluate when DB size grows
    innodb_buffer_pool_size=12G
    innodb_buffer_pool_instances=8
    innodb_file_format=Barracuda
    innodb_read_io_threads=8
    innodb_write_io_threads=4
    innodb_log_buffer_size = 512M

  resources:
#    requests:
#      cpu: 4000m
#      memory: 8Gi
    limits:
      memory: 32Gi

  volumeClaimTemplate:
    resources:
      requests:
        storage: 400Gi
    storageClassName: rook-ceph-block
    accessModes:
      - ReadWriteOnce

  service:
    type: ClusterIP

Additional context We’re using rook for storage provisioning. All disks report to be healthy and no crashes have been reported by ceph (or kubernetes) over the last 24 hours.

Environment details:

Kubernetes version:

Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:20:07Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.2", GitCommit:"7f6f68fdabc4df88cfea2dcf9a19b2b830f1e647", GitTreeState:"clean", BuildDate:"2023-05-17T14:13:28Z", GoVersion:"go1.20.4", Compiler:"gc", Platform:"linux/amd64"}

mariadb-operator version: 0.19.0
Install method: helm
Install flavour: recommended

About this issue

Original URL
State: closed
Created a year ago
Comments: 18 (9 by maintainers)

Most upvoted comments

Hey @tflabs-nl ! thanks a lot for your effort diagnosing this. This is very valuable not only for your use case but also for the troubleshooting section I’m currently writing.

I don’t see any UUID’s being mentioned unfortunately…

right, but then this explains why the agent fails trying to recover them, returning the error:

error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>

these log lines might be pointing the root cause:

2023-08-05  6:15:51 0 [ERROR] Can't open and lock privilege tables: Table 'mysql.servers' doesn't exist
2023-08-05  6:15:51 0 [Note] WSREP: disabled, skipping position recovery

after quickly googling them, I’ve seen other people reporting similar problems:

will dig deeper and get back to you

cc @grooverdan

Thanks a lot! 🙏🏻

mmontes11 on Aug 5, 2023

Hi @mmontes11 thank you for checking this out!

I’ve set the operator log level to debug in the deployment and verified that it got properly applied. Also, I changed the recovery times to the following: {"clusterBootstrapTimeout":"50m0s","clusterHealthyTimeout":"15m0s","enabled":true,"podRecoveryTimeout":"30m0s","podSyncTimeout":"30m0s"}

Meanwhile I noticed this line in the logs:

{"level":"error","ts":1691069098.21376,"logger":"handler.recovery","msg":"error recovering galera from recovery log","error":"error unmarshaling bootstrap: unable to find uuid and seqno: uuid=<nil> seqno=<nil>","stacktrace":"k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:233\nk8s.io/apimachinery/pkg/util/wait.WaitForWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:660\nk8s.io/apimachinery/pkg/util/wait.poll\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:594\nk8s.io/apimachinery/pkg/util/wait.PollImmediateUntilWithContext\n\t/go/pkg/mod/k8s.io/apimachinery@v0.24.2/pkg/util/wait/wait.go:545\ngithub.com/mariadb-operator/agent/pkg/handler.(*Recovery).pollUntilRecovered\n\t/app/pkg/handler/recovery.go:118\ngithub.com/mariadb-operator/agent/pkg/handler.(*Recovery).Post\n\t/app/pkg/handler/recovery.go:90\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:444\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:316\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:444\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/mariadb-operator/agent/pkg/kubernetesauth.(*KubernetesAuth).Handler.func1\n\t/app/pkg/kubernetesauth/kubernetesauth.go:76\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5/middleware.RequestLogger.func1.1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/middleware/logger.go:54\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:73\ngithub.com/go-chi/chi/v5.(*Mux).Mount.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:316\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).routeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:444\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5/middleware.Recoverer.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/middleware/recoverer.go:43\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5/middleware.(*Compressor).Handler.func1\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/middleware/compress.go:213\nnet/http.HandlerFunc.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2122\ngithub.com/go-chi/chi/v5.(*Mux).ServeHTTP\n\t/go/pkg/mod/github.com/go-chi/chi/v5@v5.0.8/mux.go:90\nnet/http.serverHandler.ServeHTTP\n\t/usr/local/go/src/net/http/server.go:2936\nnet/http.(*conn).serve\n\t/usr/local/go/src/net/http/server.go:1995"}

probably because of unclean shutdown on (eventually) all nodes no last commit/record UUID is stored. What is the best way to bootstrap a cluster in which all nodes failed using the operator?

Meanwhile this piece of debugging info is recorded by the operator:

{"level":"debug","ts":1691069359.204221,"logger":"galera.recovery.cluster","msg":"Error polling","controller":"mariadb","controllerGroup":"mariadb.mmontes.io","controllerKind":"MariaDB","mariaDB":{"name":"mariadb-galera-dev","namespace":"mariadb-operator"},"namespace":"mariadb-operator","name":"mariadb-galera-dev","reconcileID":"4cedcf81-e3e4-45e3-86e5-c9e36c196d0a","err":"error doing request: Post \"http://mariadb-galera-dev-2.mariadb-galera-dev-internal.mariadb-operator.svc.cluster.local:5555/api/recovery\": context deadline exceeded (Client.Timeout exceeded while awaiting headers)"}

tflabs-nl on Aug 3, 2023