pot: [BUG] Pot nomad logic leaves dying jails behind [suggested fix]
Describe the bug When running pots under nomad, dying jails are left behind forever.
Example output from my test jailhost:
# jls -d
JID IP Address Hostname Path
1 testwww1_accb2b46-b3ff-b1b4-2 /opt/pot/jails/testwww1_accb2b46-b3ff-b1b4-2de2-aa78efd8864a/m
2 testwww1_81126bd6-eea4-f624-f /opt/pot/jails/testwww1_81126bd6-eea4-f624-fc5b-dc1400624245/m
3 testwww1_1d6bc5f5-99bb-aab3-e /opt/pot/jails/testwww1_1d6bc5f5-99bb-aab3-e860-a6ab2a985017/m
4 testwww1_1d6bc5f5-99bb-aab3-e /opt/pot/jails/testwww1_1d6bc5f5-99bb-aab3-e860-a6ab2a985017/m
5 testwww1_95735a23-a906-b82d-4 /opt/pot/jails/testwww1_95735a23-a906-b82d-4255-4ab3b10858b1/m
6 testwww1_b2d5f3ec-e576-15e2-9 /opt/pot/jails/testwww1_b2d5f3ec-e576-15e2-965b-12183644e79a/m
7 testwww1_54719fef-9e0f-9c45-4 /opt/pot/jails/testwww1_54719fef-9e0f-9c45-4deb-2ec525ce6bf3/m
8 testwww1_4d7344e5-df72-c3d4-8 /opt/pot/jails/testwww1_4d7344e5-df72-c3d4-8988-44092068b6b3/m
9 testwww1_1fab71b1-d296-505c-a /opt/pot/jails/testwww1_1fab71b1-d296-505c-a08e-8060da916135/m
10 testwww1_102b9ef8-50f9-b896-f /opt/pot/jails/testwww1_102b9ef8-50f9-b896-f89d-fc4715def9e7/m
11 testwww1_108bc5ab-0ff4-9bd8-9 /opt/pot/jails/testwww1_108bc5ab-0ff4-9bd8-96b4-0400639c6de0/m
12 testwww1_e691ef0c-b490-8792-5 /opt/pot/jails/testwww1_e691ef0c-b490-8792-50d1-cf734f3c15cf/m
13 testwww1_d5ebb5cf-3f13-826c-c /opt/pot/jails/testwww1_d5ebb5cf-3f13-826c-c140-2242a03fe4d8/m
14 testwww1_754910e9-7b3b-359d-7 /opt/pot/jails/testwww1_754910e9-7b3b-359d-7d1c-e333c00b881a/m
15 testwww1_a4a40168-a950-91ba-2 /opt/pot/jails/testwww1_a4a40168-a950-91ba-2f44-cdb4995fc145/m
16 testwww1_3af58a84-79ce-fd90-8 /opt/pot/jails/testwww1_3af58a84-79ce-fd90-829b-a8b53b1d09cc/m
17 testwww1_3af58a84-79ce-fd90-8 /opt/pot/jails/testwww1_3af58a84-79ce-fd90-829b-a8b53b1d09cc/m
18 testwww1_882467c3-80c7-9d75-3 /opt/pot/jails/testwww1_882467c3-80c7-9d75-39c4-644c0f939cfb/m
19 testwww1_2ba86885-ce95-9347-3 /opt/pot/jails/testwww1_2ba86885-ce95-9347-35c5-c22c347714e3/m
20 testwww1_b6d1b9fe-57dc-3a30-6 /opt/pot/jails/testwww1_b6d1b9fe-57dc-3a30-649d-dfa21879f015/m
21 testwww1_8376faa2-6851-07b0-5 /opt/pot/jails/testwww1_8376faa2-6851-07b0-5307-9b32d3a4266a/m
To Reproduce Steps to reproduce the behavior:
- Create a pot that runs a couple of services (e.g., postgresql, nginx, and something else)
- Start the pot using nomad
- Stop the pot using nomad
- Repeat the last two steps a couple of times
- Perceive dying jails using
jls -d. Usually these would be around for a couple of minutes, but they’re staying around forever.
Expected behavior Dying jails disappear after a while.
Additional context I suspect the problem stems from the logic how the nomad-pot-driver and pot interact.
The logic seems to be:
- nomad-pot-driver calls
pot start pot startruns jail, which uses exec.start=/tmp/tinirc which runs some program in the foreground (e.g. nginx, or in a more complex setup, simplytail -f /dev/null, as services run in the background)nomad stop servicemakes the nomad-pot-driver callpot stoppot stopremoves the jail (that is still in the process of starting(!)), then it sleeps one second, then it removes epair interfaces.- Meanwhile, the still running
pot startprocess is done starting the jail (which was stopped while it was still starting), sleeps for one second, then it runspot stopagain and destroys the epair interface.
My suspicion is that this overlapping of start and stop causes some resource leakage (of some network resource), which causes the jails to stay in “dying” forever.
This is the code in question:
jail -c -J "/tmp/${_pname}.jail.conf" $_param exec.start="$_cmd"
sleep 1
if ! _is_pot_running "$_pname" ; then
start-cleanup "$_pname" "${_iface}"
if [ "$_persist" = "NO" ]; then
return 0
else
return 1
fi
fi
If I change
- tinirc to be non-blocking (just starts some daemons) and
- the code to start a jail like below:
jail -c -J "/tmp/${_pname}.jail.conf" $_param exec.start="$_cmd"
sleep 1
if ! _is_pot_running "$_pname" ; then
start-cleanup "$_pname" "${_iface}"
if [ "$_persist" = "NO" ]; then
return 0
else
return 1
fi
fi
jexec "$_pname" tail -f /dev/null
the jails left behind dying will actually disappear after a while, my theory being, that stopping a fully started jail is preventing the resource leak (or something in the code destroying interfaces).
For my images this works just fine (as my /tmp/tinirc file would end in “tail -f /dev/null” anyway and I’m starting services in it, that keep the jail up and running).
In a more generalized setup, where users might just want to start one little thing, a different approach could be used, e.g.:
- Run “sleep 10&” in /tmp/tinirc
- Run the actual command instead of “tail -f /dev/null” using jexec
Running the actual command could also be done using tinirc, by making it accept a parameter - this would probably also keep it backwards compatible with old tinirc scripts that are blocking (as it would never reach jexec when running those).
Example jail start code (untested):
jail -c -J "/tmp/${_pname}.jail.conf" $_param exec.start="$_cmd init"
sleep 1
if ! _is_pot_running "$_pname" ; then
start-cleanup "$_pname" "${_iface}"
if [ "$_persist" = "NO" ]; then
return 0
else
return 1
fi
fi
jexec "$_pname" $_cmd
Example tinirc script (untested):
export "PATH=/usr/local/bin:/usr/local/sbin:/usr/bin:/usr/sbin:/sbin:/bin"
export "HOME=/"
export "LANG=C.UTF-8"
export "MM_CHARSET=UTF-8"
export "PWD=/"
export "RC_PID=24"
export "NOMAD_GROUP_NAME=group1"
export "NOMAD_MEMORY_LIMIT=64"
export "NOMAD_CPU_LIMIT=200"
export "BLOCKSIZE=K"
export "NOMAD_TASK_NAME=www1"
export _POT_NAME=testwww1_1191b963-a72a-b6be-a3eb-0a6201b10ac2
export _POT_IP=10.192.0.16
case $1 in
init)
ifconfig epair0b inet 10.192.0.16 netmask 255.192.0.0
route add default 10.192.0.1
ifconfig lo0 inet 127.0.0.1 alias
sleep 10&
exit 0
;;
esac
# could also be given an explicit "run" option
exec /usr/local/bin/somecommand
The code isn’t complete this way of course (as it will break “normal” pots), but I hope the concept makes sense.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (1 by maintainers)
Commits related to this issue
- Garbage collect POSIX shared memory on stop Addresses parts of #150 — committed to grembo/pot by grembo 3 years ago
- Garbage collect POSIX shared memory on stop Addresses parts of #150. As `posixshmcontrol` doesn't support jails, we simply use the path of the shared memory segment to determine if it should be coll... — committed to grembo/pot by grembo 3 years ago
- Simplify pot start procedure This aims to address parts of #150. This changes the jail start procedure from executing the start command directly to starting a background sleep process and then using... — committed to grembo/pot by grembo 3 years ago
- Garbage collect POSIX shared memory on stop Addresses parts of #150. As `posixshmcontrol` doesn't support jails, we simply use the path of the shared memory segment to determine if it should be coll... — committed to grembo/pot by grembo 3 years ago
- Garbage collect POSIX shared memory on stop (#153) Addresses parts of #150. As `posixshmcontrol` doesn't support jails, we simply use the path of the shared memory segment to determine if it sho... — committed to bsdpot/pot by grembo 3 years ago
- Simplify pot start procedure This aims to address parts of #150. This changes the jail start procedure from executing the start command directly to starting a background sleep process and then using... — committed to grembo/pot by grembo 3 years ago
- Simplify pot start procedure (#154) * Simplify pot start procedure This aims to address parts of #150. This changes the jail start procedure from executing the start command directly to starti... — committed to bsdpot/pot by grembo 3 years ago
Here a proposal on how we can implement your fix without breaking anything. When I want to have
pots treated differently. I use attributes. You can define an attribute and, if true, run thejexec "$_pname" tail -f /dev/nullthat you need