finit: Service start / stop not working correctly
When attempting to stop or restart a service they can get into wonky states where finit doesn’t seem to correctly stop or start them again.
Some side information:
-
All of our daemons run in the foreground
-
Our daemons create their own PID file as they start. The documentation seems to indicate finit should create/destroy PID files for the services automatically but if I make the daemon write to a different PID file finit seems to automatically detect this as displayed by the pid path in
initctl status myservice. I attempted to use the forking mode and manually selecting the PID file but it does not seem to fix the issue below. -
Most of the services that seem to have issues are using wrapper scripts:
#!/bin/bash
some-operation
another-operation
exec myservice
Sometimes certain services seem to have trouble starting and stopping.
... stopping the service ...
initctl stop myservice
finit: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step(): myservice(2416): running disabled/clean cond:on
finit: service_stop():Sending SIGTERM to pid:2416 name:jimsh
finit: Stopping myservice[2416], sending SIGTERM ...
finit: service_stop():kill(-2416, 15) => rc 0, errno 2
finit: svc_set_state():myservice is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice(2416): -> stopping
finit: service_step(): myservice(2416): stopping disabled/clean cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2416
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 2416, status: 0
finit: service_monitor():collected myservice(2416), normal exit: 1, signaled: 0, exit code: 0
finit: Stopped myservice[2416]
finit: service_step(): myservice( 0): running enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: service_retry():myservice crashed, trying to start it again, attempt 5
finit: Service myservice[2416] died, restarting in 2000 msec (5/10)
finit: service_step(): myservice( 0): halted enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> waiting
finit: service_step(): myservice( 0): waiting enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> UNKNOWN
finit: service_step(): myservice( 0): UNKNOWN enabled/clean cond:on
finit: Starting myservice[2635]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step(): myservice(2635): -> running
finit: service_step(): myservice(2635): running enabled/clean cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000002
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2635
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
... service is started again and is not up and running ...
The alternative is that after an application has crashed, or I have changed runlevels (which has the same not-stopping/starting bug), it seems like I can’t get certain services to start again:
initctl restart myservice
init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():start myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step(): myservice( 0): halted enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> waiting
finit: service_step(): myservice( 0): waiting enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> UNKNOWN
finit: service_step(): myservice( 0): UNKNOWN enabled/clean cond:on
finit: Starting myservice[4727]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step(): myservice(4727): -> running
finit: service_step(): myservice(4727): running enabled/clean cond:on
finit: sigchld_cb():Collected child 4727, status: 256
finit: service_monitor():collected myservice(4727), normal exit: 1, signaled: 0, exit code: 1
finit: service_step(): myservice( 0): running enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 1
finit: Service myservice[1106] died, restarting in 2000 msec (1/10)
finit: service_step(): myservice( 0): halted enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> waiting
finit: service_step(): myservice( 0): waiting enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> UNKNOWN
finit: service_step(): myservice( 0): UNKNOWN enabled/clean cond:on
finit: Starting myservice[4830]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step(): myservice(4830): -> running
finit: service_step(): myservice(4830): running enabled/clean cond:on
finit: sigchld_cb():Collected child 4830, status: 256
finit: service_monitor():collected myservice(4830), normal exit: 1, signaled: 0, exit code: 1
finit: service_step(): myservice( 0): running enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 2
finit: Service myservice[1106] died, restarting in 2000 msec (2/10)
finit: service_step(): myservice( 0): halted enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> waiting
finit: service_step(): myservice( 0): waiting enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> UNKNOWN
finit: service_step(): myservice( 0): UNKNOWN enabled/clean cond:on
finit: Starting myservice[4835]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step(): myservice(4835): -> running
finit: service_step(): myservice(4835): running enabled/clean cond:on
finit: sigchld_cb():Collected child 4835, status: 256
finit: service_monitor():collected myservice(4835), normal exit: 1, signaled: 0, exit code: 1
finit: service_step(): myservice( 0): running enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 3
finit: Service myservice[1106] died, restarting in 2000 msec (3/10)
finit: service_step(): myservice( 0): halted enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> waiting
finit: service_step(): myservice( 0): waiting enabled/clean cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step(): myservice( 0): -> UNKNOWN
finit: service_step(): myservice( 0): UNKNOWN enabled/clean cond:on
... finit will give up after 10 restarts ...
Running the application manually works perfectly. Watching ps while finit attempts to restart shows that nothing ever gets started.
This happens on both v4.3 with the restart fix patch and master. I can’t remember if I saw this on the pure v4.3 branch
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- initctl: dump svc type and forking in JSON output First set of svc_t properties not possible to debug otherwise. See issue #313 for a background. Signed-off-by: Joachim Wiberg <troglobit@gmail.com> — committed to troglobit/finit by troglobit 2 years ago
- test: new regression test for bug #313 Signed-off-by: Joachim Wiberg <troglobit@gmail.com> — committed to troglobit/finit by troglobit 2 years ago
- Cancel pending restart timer on initctl stop/start/restart/reload When a service crashes Finit launches a restart timer that periodically will try restart the service. If a user calls `initctl stop ... — committed to troglobit/finit by troglobit 2 years ago
- Fix issue with services suddenly converting to 'forking' type When a service starts up the pidfile.so plugin automatically detects any new pidfiles that show up in the system. If it finds a matching... — committed to troglobit/finit by troglobit 2 years ago
Perfect! Looks like this issue is solved. Manually starting/stopping/crashing is all good and I can’t fault it. My handful of daemons that got the forking status are no longer getting that flag and also seem to be behaving.
🎉🎉 Thank you
Not sure if I should make a new ticket, but I found the same race condition with crashing and conditions.
parentservicemyservicecrashes due toparentdisappearing (I guess before finit has had a chance to stop it? TBD I think finit should probably kill “dependent” services before stopping the user requested one.)myserviceretry timer startsinitctl condshows<service/parent/running>as offmyservicetimer gets hit and the service becomesrunningwith pid of0again.I can’t seem to get this same race condition using the
/usrfeature. I guess the SIGHUP must cover it up.If I do a
initctl stop myservice; initctl start myservicethen I correctly get thewaitingstatus onmyserviceThe log: