deepflow: pod deepflow-server-0 CrashLoopBackOff

Expected Behavior

pod deepflow-server-0 in running status

Actual Behavior

pod deepflow-server-0 CrashLoopBackOff

root@node01:~# kubectl -n deepflow get pods
NAME                                READY   STATUS             RESTARTS      AGE
deepflow-agent-f69k8                1/1     Running            0             14m
deepflow-agent-m4n8g                1/1     Running            0             14m
deepflow-app-84648d9786-6z4nb       1/1     Running            0             14m
deepflow-clickhouse-0               1/1     Running            0             14m
deepflow-grafana-5c67685dc4-5hkjt   1/1     Running            4 (12m ago)   14m
deepflow-mysql-6967788cdb-nhjf8     1/1     Running            0             14m
deepflow-server-0                   1/2     CrashLoopBackOff   7 (84s ago)   14m

Steps to Reproduce the Problem

helm upgrade --install deepflow charts/deepflow --namespace=deepflow --create-namespace --set global.storageClass=null

Additional Info

  • deepflow version: v6.1.1

    Output of kubectl exec -it -n deepflow sts/deepflow-server -c deepflow-server -- deepflow-server -v:

    Output of kubectl exec -it -n deepflow ds/deepflow-agent -- deepflow-agent -v:

root@node01:~#  kubectl exec -it -n deepflow sts/deepflow-server -c deepflow-server -- deepflow-server -v
6204 36a4673baa5beab793089fae08f5c91ea46d5593 2022-08-05
deepflow-server community edition
go version go1.18.3 linux/amd64
root@node01:~# kubectl exec -it -n deepflow ds/deepflow-agent -- deepflow-agent -v
6201-2c7b4c721cf419963f6233420180b7470f9e4e2f 2022-08-05
deepflow-agent community edition
rustc 1.62.1 (e092d0b6b 2022-07-16)
  • deepflow agent list:

    Output of deepflow-ctl agent list:

root@node01:~# deepflow-ctl agent list
NAME                                            CTRL_IP                         CTRL_MAC                STATE           EXCEPTIONS
node02-P3                                       192.168.72.51                   00:50:56:ad:5a:ec       RUNNING         
node03-P1                                       192.168.72.52                   00:50:56:ad:cb:66       RUNNING 
  • Kubernetes CNI:
calico
  • operation-system/kernel version:

    Output of awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release:

    Output of uname -r:

root@node01:~# awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
"Ubuntu 22.04 LTS"

root@node01:~# uname -r
5.15.0-43-generic

pods error logs

root@node01:~# kubectl -n deepflow logs -f deepflow-server-0  
INDEX eth_type_idx (eth_type) TYPE set(300) GRANULARITY 3,
`vlan` UInt16  ,
INDEX vlan_ ...
2022-08-05 22:43:12.763 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_log.`l4_flow_log` AS flow_log.`l4_flow_log_local` ENGINE=Distributed('df_cluster', 'flow_log', 'l4_flow_log_local', rand())
2022-08-05 22:43:12.767 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_log, table=l7_flow_log_local, replica=false, queueCount=1, queueSize=1000000, batchSize=512000, flushTimeout=10s
2022-08-05 22:43:12.768 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_log
2022-08-05 22:43:12.771 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_log.`l7_flow_log_local`
(`_id` UInt64  CODEC(DoubleDelta),
`region_id_0` UInt16  ,
INDEX region_id_0_idx (region_id_0) TYPE minmax GRANULARITY 3,
`region_id_1` UInt16  ,
INDEX region_id_1_idx (region_id_1) TYPE minmax GRANU ...
2022-08-05 22:43:12.781 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_log.`l7_flow_log` AS flow_log.`l7_flow_log_local` ENGINE=Distributed('df_cluster', 'flow_log', 'l7_flow_log_local', rand())
2022-08-05 22:43:12.788 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=flow_log_custom_field_local, replica=false, queueCount=1, queueSize=1000000, batchSize=512000, flushTimeout=10s
2022-08-05 22:43:12.788 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.790 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`flow_log_custom_field_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id) TYPE mi ...
2022-08-05 22:43:12.792 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`flow_log_custom_field` AS flow_tag.`flow_log_custom_field_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'flow_log_custom_field_local', rand())
2022-08-05 22:43:12.795 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=flow_log_custom_field_value_local, replica=false, queueCount=1, queueSize=1000000, batchSize=512000, flushTimeout=10s
2022-08-05 22:43:12.795 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.813 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`flow_log_custom_field_value_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id) T ...
2022-08-05 22:43:12.821 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`flow_log_custom_field_value` AS flow_tag.`flow_log_custom_field_value_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'flow_log_custom_field_value_local', rand())
2022-08-05 22:43:12.837 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=ext_metrics_custom_field_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.837 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.845 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.845 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.845 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.846 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.846 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.846 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.846 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.847 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.847 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.847 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.847 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.848 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.858 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id) TYPE ...
2022-08-05 22:43:12.863 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field` AS flow_tag.`ext_metrics_custom_field_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'ext_metrics_custom_field_local', rand())
2022-08-05 22:43:12.865 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=ext_metrics_custom_field_value_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.865 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.868 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_value_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id ...
2022-08-05 22:43:12.869 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_value` AS flow_tag.`ext_metrics_custom_field_value_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'ext_metrics_custom_field_value_local', rand())
2022-08-05 22:43:12.879 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=ext_metrics_custom_field_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.879 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.885 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id) TYPE ...
2022-08-05 22:43:12.888 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field` AS flow_tag.`ext_metrics_custom_field_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'ext_metrics_custom_field_local', rand())
2022-08-05 22:43:12.895 [WARN] [stats] stats.go:105 Possible memory leak! countable queue-map[host:deepflow-server-0 index:0 module:flow_tag-ext_metrics_custom_field_local] is not correctly closed.
2022-08-05 22:43:12.895 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=ext_metrics_custom_field_value_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.895 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.902 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_value_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id ...
2022-08-05 22:43:12.903 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`ext_metrics_custom_field_value` AS flow_tag.`ext_metrics_custom_field_value_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'ext_metrics_custom_field_value_local', rand())
2022-08-05 22:43:12.908 [WARN] [stats] stats.go:105 Possible memory leak! countable queue-map[host:deepflow-server-0 index:0 module:flow_tag-ext_metrics_custom_field_value_local] is not correctly closed.
2022-08-05 22:43:12.913 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=deepflow_system_custom_field_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.913 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.915 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`deepflow_system_custom_field_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vpc_id)  ...
2022-08-05 22:43:12.916 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`deepflow_system_custom_field` AS flow_tag.`deepflow_system_custom_field_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'deepflow_system_custom_field_local', rand())
2022-08-05 22:43:12.918 [INFO] [ckwriter] ckwriter.go:120 New CK writer: primaryAddr=deepflow-clickhouse-headless-0:9000, secondaryAddr=, user=, database=flow_tag, table=deepflow_system_custom_field_value_local, replica=false, queueCount=1, queueSize=100000, batchSize=51200, flushTimeout=10s
2022-08-05 22:43:12.918 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE DATABASE IF NOT EXISTS flow_tag
2022-08-05 22:43:12.919 [INFO] [ckwriter] ckwriter.go:74 Exec SQL: 
CREATE TABLE IF NOT EXISTS flow_tag.`deepflow_system_custom_field_value_local`
(`time` DateTime('Asia/Shanghai')  CODEC(DoubleDelta),
INDEX time_idx (time) TYPE minmax GRANULARITY 3,
`table` LowCardinality(String)  ,
`vpc_id` Int32  ,
INDEX vpc_id_idx (vp ...
2022-08-05 22:43:12.921 [INFO] [ckwriter] ckwriter.go:76 Exec SQL:  CREATE TABLE IF NOT EXISTS flow_tag.`deepflow_system_custom_field_value` AS flow_tag.`deepflow_system_custom_field_value_local` ENGINE=Distributed('df_cluster', 'flow_tag', 'deepflow_system_custom_field_value_local', rand())
2022-08-05 22:43:12.924 [INFO] [datasource] datasource.go:195 datasource manager started
2022-08-05 22:43:12.925 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.925 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.925 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.925 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.925 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.926 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
2022-08-05 22:43:12.926 [WARN] [grpc] grpc_session.go:96 Sync from server 127.0.0.1 failed, reason: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:20035: connect: connection refused"
2022-08-05 22:43:12.927 [WARN] [grpc] grpc_platformdata.go:285 No reachable server
root@node01:~# 

i must restart the pods , and it change in running status and never CrashLoopBackOff again :

root@node01:~# kubectl -n deepflow delete pods deepflow-server-0 
pod "deepflow-server-0" deleted
root@node01:~# kubectl -n deepflow get pods
NAME                                READY   STATUS    RESTARTS      AGE
deepflow-agent-f69k8                1/1     Running   0             16m
deepflow-agent-m4n8g                1/1     Running   0             16m
deepflow-app-84648d9786-6z4nb       1/1     Running   0             16m
deepflow-clickhouse-0               1/1     Running   0             16m
deepflow-grafana-5c67685dc4-5hkjt   1/1     Running   4 (15m ago)   16m
deepflow-mysql-6967788cdb-nhjf8     1/1     Running   0             16m
deepflow-server-0                   2/2     Running   0             30s
root@node01:~# 

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15

Commits related to this issue

Most upvoted comments

@willzhang

判断为该环境mysql的性能不太好,db未初始化完成时,server因readinessProbe探活失败被kill,导致db初始化失败且没有回滚。server重启后,脏数据导致无法正确判断db该如何升级,产生异常,目前代码层面 @ZhengYa-0110 还在考虑如何优化,helm chart将readinessProbe策略调整为了如下策略,确保sql能够初始化完成,135秒足够初始化SQL了, 你可以使用最新的helm chart重新安装来解决这个问题

It is judged that the performance of mysql in this environment is not very good. When the db initialization is not completed, the server is killed due to the failure of readinessProbe detection, resulting in db initialization failure and no rollback. After the server restarts, the dirty data makes it impossible to correctly judge how to upgrade the db, resulting in an exception. At present, @ZhengYa-0110 is still considering how to optimize the code level. The helm chart adjusts the readinessProbe strategy to the following strategy to ensure that the sql can be initialized and completed, 135 seconds is enough SQL is initialized, you can reinstall using the latest helm chart to fix this.

          readinessProbe:
            tcpSocket:
              port: server
            failureThreshold: 12
            initialDelaySeconds: 15
            periodSeconds: 10
            successThreshold: 1
          livenessProbe:
            failureThreshold: 7
            initialDelaySeconds: 15
            periodSeconds: 20
            successThreshold: 1
            tcpSocket:
              port: server
            timeoutSeconds: 1

Thanks , it’s resolved by use the latest helm charts 0.1.006

$ helm repo update

$ helm search repo deepflow
NAME                    CHART VERSION   APP VERSION     DESCRIPTION                                       
deepflow/deepflow       0.1.006         6.1.1           An automated observability platform for cloud-n...
deepflow/deepflow-agent 0.1.006         6.1.1           An automated observability platform for cloud-n...
root@node01:~# kubectl -n deepflow get pods 
NAME                                READY   STATUS    RESTARTS        AGE
deepflow-agent-r2tn2                1/1     Running   1 (95s ago)     5m56s
deepflow-agent-vf66g                1/1     Running   0               5m56s
deepflow-agent-wrsnc                1/1     Running   0               5m56s
deepflow-app-84648d9786-8qjpq       1/1     Running   0               5m56s
deepflow-clickhouse-0               1/1     Running   0               5m56s
deepflow-grafana-55c9957fc8-z92sr   1/1     Running   4 (4m40s ago)   5m56s
deepflow-mysql-6967788cdb-zlt7n     1/1     Running   0               5m56s
deepflow-server-0                   2/2     Running   3 (4m52s ago)   5m56s