dapr: Darp placement - Raft and Health are not started.
Note: If you have a general support question and are looking for a quicker response, please checkout our discord channel for answers from the community: https://aka.ms/dapr-discord
In what area(s)?
/area placement
Ask your question here
Hi ,
After an upgrade of our kubernetes version , the placement server are in CrashLoopBackOff

In the logs we only have the following :
time="2022-07-11T13:04:15.082415966Z" level=info msg="starting Dapr Placement Service -- version 1.8.0 -- commit dc7f86840c85a1eff2e1223456994f554ea31d11" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:04:15.082798161Z" level=info msg="log level set to: debug" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:04:15.082878312Z" level=info msg="metrics server started on :9090/" instance=dapr-placement-server-0 scope=dapr.metrics type=log ver=1.8.0
On my minikube instance the same configuration works with the same version of dapr and kubernetes
the logs are the following :
time="2022-07-11T13:11:30.2769734Z" level=info msg="starting Dapr Placement Service -- version 1.8.0 -- commit dc7f86840c85a1eff2e1223456994f554ea31d11" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.2771322Z" level=info msg="log level set to: debug" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.2772933Z" level=info msg="metrics server started on :9090/" instance=dapr-placement-server-0 scope=dapr.metrics type=log ver=1.8.0
time="2022-07-11T13:11:30.2789367Z" level=debug msg="initial configuration%!(EXTRA []interface {}=[index 1 servers [%+v [{Voter dapr-placement-server-0 dapr-placement-server-0.dapr-placement-server.dapr-system.svc.cluster.local:8201} {Voter dapr-placement-server-1 dapr-placement-server-1.dapr-placement-server.dapr-system.svc.cluster.local:8201} {Voter dapr-placement-server-2 dapr-placement-server-2.dapr-placement-server.dapr-system.svc.cluster.local:8201}]]])" instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
time="2022-07-11T13:11:30.2790101Z" level=info msg="Raft server is starting on dapr-placement-server-0.dapr-placement-server.dapr-system.svc.cluster.local:8201..." instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
time="2022-07-11T13:11:30.2790369Z" level=info msg="mTLS enabled, getting tls certificates" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.2790981Z" level=info msg="starting watch for certs on filesystem: /var/run/dapr/credentials" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.2793584Z" level=debug msg="entering follower state%!(EXTRA []interface {}=[follower Node at 172.17.0.12:8201 [Follower] leader ])" instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
time="2022-07-11T13:11:30.2793662Z" level=info msg="tls certificates loaded successfully" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.280095Z" level=info msg="placement service started on port 50005" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.2802004Z" level=info msg="Healthz server is listening on :8080" instance=dapr-placement-server-0 scope=dapr.placement type=log ver=1.8.0
time="2022-07-11T13:11:30.7936267Z" level=debug msg="accepted connection%!(EXTRA []interface {}=[local-address 172.17.0.12:8201 remote-address 172.17.0.29:43182])" instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
time="2022-07-11T13:11:30.7938483Z" level=debug msg="failed to get previous log%!(EXTRA []interface {}=[previous-index 10 last-index 1 error log not found])" instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
time="2022-07-11T13:11:30.8737392Z" level=debug msg="accepted connection%!(EXTRA []interface {}=[local-address 172.17.0.12:8201 remote-address 172.17.0.29:43184])" instance=dapr-placement-server-0 scope=dapr.placement.raft type=log ver=1.8.0
Do you know why this can happen? Raft server and health server of the placement are not started. What can be the cause?
Thanks
Manu Di Nicola
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 24 (12 by maintainers)
This problem only happens for us (@ManuDinicola and me) on kubernetes cluster upgrade according to this procedure: https://kubernetes.io/docs/tasks/administer-cluster/cluster-upgrade/
The problem does not occur when performing a rolling reboot of all kubernetes cluster nodes (with drain/uncordon and 2 min. wait between node reboots), even if two out of three replicas of dapr-placement-server are temporarily unavailable.
Could it be that a kubernetes cluster upgrade causes a network split brain between old kubernetes version nodes and new kubernetes version nodes, such that the state of the existing dapr-placement-server raft cluster becomes lost? If so, can we reset the dapr-placement-server raft cluster state as if it were a new raft cluster?
P.S. Redeploying dapr from scratch does not fix the issue (i.e. removing dapr-system namespace), so we have no way of resetting dapr’s state it seems…
@shubham1172 please can you investigate?
Does it happen on upgrades only?