tidb-operator: TiCDC missing metrics in prometheus
Missing Prometheus metrics
We’re currently trying to find the best way to monitor the replication using TiCDC and we noticed that the generated Prometheus config is probably missing something
What version of Kubernetes are you using? v1.15.9
What version of TiDB Operator are you using? v1.1.6 (tidb version v4.0.8)
What storage classes exist in the Kubernetes cluster and what are used for PD/TiKV pods? StorageClasses provided by portworx
What’s the status of the TiDB cluster pods?
kubectl get po -l app.kubernetes.io/instance=internaltools-tidb -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
internaltools-tidb-discovery-7f4789cf45-9txwc 1/1 Running 0 3d11h 10.168.35.173 orscale-01-01.adm.dc3 <none> <none>
internaltools-tidb-monitor-779d5569f6-mk667 3/3 Running 0 18h 10.168.234.5 orscale-01-07.adm.dc3 <none> <none>
internaltools-tidb-pd-0 1/1 Running 0 18h 10.168.231.128 orscale-01-31.adm.dc3 <none> <none>
internaltools-tidb-pd-1 1/1 Running 0 18h 10.168.61.231 orscale-01-30.adm.dc3 <none> <none>
internaltools-tidb-pd-2 1/1 Running 0 18h 10.168.59.96 orscale-01-29.adm.dc3 <none> <none>
internaltools-tidb-ticdc-0 1/1 Running 1 18h 10.168.59.93 orscale-01-29.adm.dc3 <none> <none>
internaltools-tidb-ticdc-1 1/1 Running 1 18h 10.168.129.98 orscale-01-33.adm.dc3 <none> <none>
internaltools-tidb-ticdc-2 1/1 Running 1 18h 10.168.3.247 orscale-01-32.adm.dc3 <none> <none>
internaltools-tidb-tidb-0 2/2 Running 0 18h 10.168.61.233 orscale-01-30.adm.dc3 <none> <none>
internaltools-tidb-tidb-1 2/2 Running 0 18h 10.168.3.245 orscale-01-32.adm.dc3 <none> <none>
internaltools-tidb-tikv-0 1/1 Running 0 18h 10.168.59.89 orscale-01-29.adm.dc3 <none> <none>
internaltools-tidb-tikv-1 1/1 Running 0 18h 10.168.231.129 orscale-01-31.adm.dc3 <none> <none>
internaltools-tidb-tikv-2 1/1 Running 0 18h 10.168.3.244 orscale-01-32.adm.dc3 <none> <none>
internaltools-tidb-tikv-3 1/1 Running 0 18h 10.168.61.232 orscale-01-30.adm.dc3 <none> <none>
internaltools-tidb-tikv-4 1/1 Running 0 18h 10.168.129.94 orscale-01-33.adm.dc3 <none> <none>
What did you do?
Check that the metric is available through the pod exporter
kubectl port-forward internaltools-tidb-ticdc-0 8301 &
curl -sk https://localhost:8301/metrics | grep -i lag
Handling connection for 8301
# HELP ticdc_processor_checkpoint_ts_lag global checkpoint ts lag of processor
# TYPE ticdc_processor_checkpoint_ts_lag gauge
ticdc_processor_checkpoint_ts_lag{capture="internaltools-tidb-ticdc-0.internaltools-tidb-ticdc-peer.internaltools.svc:8301",changefeed="350ef73f-a472-419c-a46d-b89c1043d71b"} 1.224
# HELP ticdc_processor_resolved_ts_lag local resolved ts lag of processor
# TYPE ticdc_processor_resolved_ts_lag gauge
ticdc_processor_resolved_ts_lag{capture="internaltools-tidb-ticdc-0.internaltools-tidb-ticdc-peer.internaltools.svc:8301",changefeed="350ef73f-a472-419c-a46d-b89c1043d71b"} 0.879
Checking for the same metrics in prometheus
kubectl port-forward svc/internaltools-tidb-prometheus 9090 &
And look for the metric ticdc_processor_checkpoint_ts_lag
What did you expect to see?
We were expecting to find the metric ticdc_processor_checkpoint_ts_lag
in prometheus
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (15 by maintainers)
Thanks for the update, we’ll test the v1.1.8 as soon as it is available
It seems the TLS config of TiCDC, importer and lightning is not honored. (Following configs are provided by OP in Slack discuss)
The OP’s cluster has TLS enabled, and for TiDB, TiKV, and PD, the prometheus config is like:
But for
ticdc
,importer
, andlightning
it is: