finit: Service start / stop not working correctly

When attempting to stop or restart a service they can get into wonky states where finit doesn’t seem to correctly stop or start them again.

Some side information:

  • All of our daemons run in the foreground

  • Our daemons create their own PID file as they start. The documentation seems to indicate finit should create/destroy PID files for the services automatically but if I make the daemon write to a different PID file finit seems to automatically detect this as displayed by the pid path in initctl status myservice. I attempted to use the forking mode and manually selecting the PID file but it does not seem to fix the issue below.

  • Most of the services that seem to have issues are using wrapper scripts:

#!/bin/bash

some-operation
another-operation
exec myservice

Sometimes certain services seem to have trouble starting and stopping.

... stopping the service ...
initctl stop myservice

finit: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():stop myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():             myservice(2416):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:2416 name:jimsh
finit: Stopping myservice[2416], sending SIGTERM ...
finit: service_stop():kill(-2416, 15) => rc 0, errno 2
finit: svc_set_state():myservice is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(2416): -> stopping
finit: service_step():             myservice(2416): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2416
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 2416, status: 0
finit: service_monitor():collected myservice(2416), normal exit: 1, signaled: 0, exit code: 0
finit: Stopped myservice[2416]
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: service_retry():myservice crashed, trying to start it again, attempt 5
finit: Service myservice[2416] died, restarting in 2000 msec (5/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[2635]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(2635): ->  running
finit: service_step():             myservice(2635):  running  enabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000002
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 2635
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready

... service is started again and is not up and running ...

The alternative is that after an application has crashed, or I have changed runlevels (which has the same not-stopping/starting bug), it seems like I can’t get certain services to start again:

initctl restart myservice

init: api_cb():svc query: myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: api_cb():start myservice
finit: svc_parse_jobstr():Got str:'myservice'
finit: svc_parse_jobstr():Got token:'myservice'
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4727]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4727): ->  running
finit: service_step():             myservice(4727):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4727, status: 256
finit: service_monitor():collected myservice(4727), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 1
finit: Service myservice[1106] died, restarting in 2000 msec (1/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4830]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4830): ->  running
finit: service_step():             myservice(4830):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4830, status: 256
finit: service_monitor():collected myservice(4830), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 2
finit: Service myservice[1106] died, restarting in 2000 msec (2/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on
finit: Starting myservice[4835]
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: service_step():             myservice(4835): ->  running
finit: service_step():             myservice(4835):  running  enabled/clean   cond:on
finit: sigchld_cb():Collected child 4835, status: 256
finit: service_monitor():collected myservice(4835), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():             myservice(   0):  running  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: service_retry():myservice crashed, trying to start it again, attempt 3
finit: Service myservice[1106] died, restarting in 2000 msec (3/10)
finit: service_step():             myservice(   0):   halted  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  waiting
finit: service_step():             myservice(   0):  waiting  enabled/clean   cond:on
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():             myservice(   0): ->  UNKNOWN
finit: service_step():             myservice(   0):  UNKNOWN  enabled/clean   cond:on

... finit will give up after 10 restarts ...

Running the application manually works perfectly. Watching ps while finit attempts to restart shows that nothing ever gets started.

This happens on both v4.3 with the restart fix patch and master. I can’t remember if I saw this on the pure v4.3 branch

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Perfect! Looks like this issue is solved. Manually starting/stopping/crashing is all good and I can’t fault it. My handful of daemons that got the forking status are no longer getting that flag and also seem to be behaving.

🎉🎉 Thank you


Not sure if I should make a new ticket, but I found the same race condition with crashing and conditions.

service <service/parent/running> /bin/myservice
## or
service <pid/parent> /bin/myservice
  1. Stop parent service
  2. myservice crashes due to parent disappearing (I guess before finit has had a chance to stop it? TBD I think finit should probably kill “dependent” services before stopping the user requested one.)
  3. myservice retry timer starts
  4. initctl cond shows <service/parent/running> as off
  5. myservice timer gets hit and the service becomes running with pid of 0 again.

I can’t seem to get this same race condition using the /usr feature. I guess the SIGHUP must cover it up.

If I do a initctl stop myservice; initctl start myservice then I correctly get the waiting status on myservice

The log:

finit: api_cb():svc query: parent
finit: svc_parse_jobstr():Got str:'parent'
finit: svc_parse_jobstr():Got token:'parent'
finit: api_cb():stop parent
finit: svc_parse_jobstr():Got str:'parent'
finit: svc_parse_jobstr():Got token:'parent'
finit: service_step():                 parent(7633):  running disabled/clean   cond:on
finit: service_stop():Sending SIGTERM to pid:7633 name:parent
finit: Stopping parent[7633], sending SIGTERM ...
finit: service_stop():kill(-7633, 15) => rc 0, errno 2
finit: svc_set_state():parent is stopping, wait 3 sec before sending SIGKILL ...
finit: cond_clear():service/parent/
finit: cond_clear_noupdate():service/parent/
finit: cond_set_path():/var/run/finit/cond/service/parent/ <= 0
finit: service_step():                 parent(7633): -> stopping
finit: service_step():                 parent(7633): stopping disabled/clean   cond:on
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/parent.pid, mask: 00000008
finit: pidfile_update_conds():Found svc parent for /var/run/parent.pid with pid 7633
finit: pidfile_update_conds():Setting parent PID file to /var/run/parent.pid
finit: cond_set():pid/parent
finit: cond_set_noupdate():pid/parent
finit: cond_set_path():/var/run/finit/cond/pid/parent <= 2
finit: cond_set():service/parent/ready
finit: cond_set_noupdate():service/parent/ready
finit: cond_set_path():/var/run/finit/cond/service/parent/ready <= 2
finit: cond_update():service/parent/ready
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000008
finit: pidfile_update_conds():Found svc myservice for /var/run/myservice.pid with pid 7634
finit: pidfile_update_conds():Setting myservice PID file to /var/run/myservice.pid
finit: cond_set():pid/myservice
finit: cond_set_noupdate():pid/myservice
finit: cond_set_path():/var/run/finit/cond/pid/myservice <= 2
finit: cond_set():service/myservice/ready
finit: cond_set_noupdate():service/myservice/ready
finit: cond_set_path():/var/run/finit/cond/service/myservice/ready <= 2
finit: cond_update():service/myservice/ready
finit: sigchld_cb():Collected child 7633, status: 256
finit: service_monitor():collected parent(7633), normal exit: 1, signaled: 0, exit code: 1
finit: service_step():                 parent(   0): stopping disabled/clean   cond:on
finit: service_step():parent: stopped, cleaning up timers and conditions ...
finit: cond_clear():pid/parent
finit: cond_clear_noupdate():pid/parent
finit: cond_set_path():/var/run/finit/cond/pid/parent <= 0
finit: cond_update():pid/parent
finit: cond_clear():service/parent/
finit: cond_clear_noupdate():service/parent/
finit: cond_set_path():/var/run/finit/cond/service/parent/ <= 0
finit: cond_set_oneshot():service/parent/stopped
finit: cond_set_oneshot_noupdate():service/parent/stopped => /var/run/finit/cond/service/parent/stopped
finit: cond_update():service/parent/stopped
finit: service_step():                 parent(   0): ->  stopped
finit: service_step():                 parent(   0):  stopped disabled/clean   cond:on
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: sigchld_cb():Collected child 7634, status: 0
finit: service_monitor():collected myservice(7634), normal exit: 1, signaled: 0, exit code: 0
finit: service_step():               myservice(   0):  running  enabled/clean   cond:off
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/restart
finit: cond_set_oneshot_noupdate():service/myservice/restart => /var/run/finit/cond/service/myservice/restart
finit: cond_update():service/myservice/restart
finit: service_step():delayed restart of myservice
finit: sm_step():state: running, runlevel: 2, newlevel: -1, teardown: 0, reload: 0
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: pidfile_update_conds():path: /var/run/parent.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/parent.pid
finit: pidfile_update_conds():path: /var/run/myservice.pid, mask: 00000200
finit: pidfile_update_conds():No matching svc for /var/run/myservice.pid
finit: service_retry():myservice crashed, trying to start it again, attempt 5
finit: Service myservice[7634] died, restarting in 2000 msec (5/10)
finit: service_step():               myservice(   0):   halted  enabled/clean   cond:off
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: service_step():               myservice(   0): ->  waiting
finit: service_step():               myservice(   0):  waiting  enabled/clean   cond:off
finit: generic_io_cb():Calling I/O pidfile from runloop...
finit: cond_clear():service/myservice/
finit: cond_clear_noupdate():service/myservice/
finit: cond_set_path():/var/run/finit/cond/service/myservice/ <= 0
finit: cond_set_oneshot():service/myservice/running
finit: cond_set_oneshot_noupdate():service/myservice/running => /var/run/finit/cond/service/myservice/running
finit: cond_update():service/myservice/running
finit: Successfully restarted crashing service myservice.