tests: clh: docker kill test hangs randomly

Description of problem

The integration docker test fail randomly

Expected result

All the test passes

Actual result

After a kill signal the container process seems that got it but the agnet request seems to fail after that.

Logs found in a local test.

kill

Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.790995976Z" level=info msg="creating Sandbox" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb
9cb721bda213ef9409 function=createSandbox name=kata-runtime pid=69931 source=virtcontainers subsystem=cloudHypervisor
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.791152376Z" level=debug msg="converting /run/containerd/io.containerd.runtime.v1.linux/moby/9c6f2b7693df66fb8cba43e909208ebf12
c9ce424f40bb9cb721bda213ef9409/config.json" source=virtcontainers subsystem=compatoci
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.792353079Z" level=debug msg="New filesystem store backend for /var/lib/vc/sbs/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9c
b721bda213ef9409/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409" arch=amd64 backend=filesystem command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef94
09 name=kata-runtime path=/var/lib/vc/sbs/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 pid=69931 source=virtconta
iners/store subsystem=store
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.792465779Z" level=debug msg="New filesystem store backend for /run/vc/sbs/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721
bda213ef9409/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409" arch=amd64 backend=filesystem command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 n
ame=kata-runtime path=/run/vc/sbs/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409/9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 pid=69931 source=virtcontainers/st
ore subsystem=store
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.79278088Z" level=info msg="release sandbox" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9c
b721bda213ef9409 name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=sandbox
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.79285058Z" level=info msg="Disconnecting Sandbox Console" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12
c9ce424f40bb9cb721bda213ef9409 function=disconnect name=kata-runtime pid=69931 source=virtcontainers subsystem=cloudHypervisor
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.79293918Z" level=info msg=kill arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9
409 container state=running name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 signal=21 source=runtime
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.79300928Z" level=info msg="fetch sandbox" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb7
21bda213ef9409 name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.793513881Z" level=info msg="endpoint unmarshalled" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424
f40bb9cb721bda213ef9409 endpoint="&{{{83ae6427-ae34-4bce-8e42-7923588b0c6a br0_kata {tap0_kata 02:42:ac:11:00:02 []} [] []} {eth0 0a:65:fd:ee:50:05 []} 2} {{{584 1500 0 eth0 02:42:ac:11:00:02 up|bro
adcast|multicast 69699 585 0 <nil>  0xc0003e0780 0 0xc000382940 ether <nil> up 0 0 0 65536 65535 []} veth} [172.17.0.2/16 eth0] [{Ifindex: 584 Dst: <nil> Src: <nil> Gw: 172.17.0.1 Flags: [] Table: 2
54} {Ifindex: 584 Dst: 172.17.0.0/16 Src: 172.17.0.2 Gw: <nil> Flags: [] Table: 254}] {[]  [] []}} virtual }" endpoint-type=virtual name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208e
bf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=network
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.793701482Z" level=info msg="creating Sandbox" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb
9cb721bda213ef9409 function=createSandbox name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=cloudHypervisor
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.793830982Z" level=debug msg="converting /run/containerd/io.containerd.runtime.v1.linux/moby/9c6f2b7693df66fb8cba43e909208ebf12
c9ce424f40bb9cb721bda213ef9409/config.json" source=virtcontainers subsystem=compatoci
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.794805784Z" level=info msg="New client" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721
bda213ef9409 name=kata-runtime pid=69931 proxy=69678 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=kata_agent url="hvsock:///run/vc/vm/9c6f
2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409/clh.sock:1024"
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.860275229Z" level=debug msg="sending request" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb
9cb721bda213ef9409 name=grpc.SignalProcessRequest pid=69931 req="container_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409\" exec_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424
f40bb9cb721bda213ef9409\" signal:21 " sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=kata_agent
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.87891257Z" level=info msg="release sandbox" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9c
b721bda213ef9409 name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=sandbox
Dec 04 19:06:34 jcvenega kata-runtime[69931]: time="2019-12-04T19:06:34.879034471Z" level=info msg="Disconnecting Sandbox Console" arch=amd64 command=kill container=9c6f2b7693df66fb8cba43e909208ebf1
2c9ce424f40bb9cb721bda213ef9409 function=disconnect name=kata-runtime pid=69931 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=virtcontainers subsystem=cloudHypervis
or

agent logs

time="2019-12-04T19:06:34.283629714Z" level=debug msg="request end" debug_console=false duration=242.353644ms name=kata-agent pid=39 request=/grpc.AgentService/ReadStdout resp="data:\"TRAP_R[0/7872]
r\\n\" " sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=agent
time="2019-12-04T19:06:34.292036265Z" level=debug msg="new request" debug_console=false name=kata-agent pid=39 req="container_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409\"
exec_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409\" len:32768 " request=/grpc.AgentService/ReadStdout sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef940
9 source=agent
time="2019-12-04T19:06:34.622367268Z" level=debug msg="new request" debug_console=false name=kata-agent pid=39 req="container_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409\"
exec_id:\"9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409\" signal:21 " request=/grpc.AgentService/SignalProcess sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef
9409 source=agent
time="2019-12-04T19:06:34.631770981Z" level=debug msg="request end" debug_console=false duration="526.236µs" name=kata-agent pid=39 request=/grpc.AgentService/SignalProcess resp="&Empty{XXX_unrecogn
ized:[],}" sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=agent
time="2019-12-04T19:06:35.304953045Z" level=debug msg="request end" debug_console=false duration=1.000723988s name=kata-agent pid=39 request=/grpc.AgentService/ReadStdout resp= sandbox=9c6f2b7693df6
6fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=agent
time="2019-12-04T19:06:35.310675839Z" level=debug msg="process exited" debug_console=false name=kata-agent pid=54 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409 source=agen
t status=21
time="2019-12-04T19:06:35.315042997Z" level=info msg="ignoring unexpected signal" debug_console=false name=kata-agent pid=39 sandbox=9c6f2b7693df66fb8cba43e909208ebf12c9ce424f40bb9cb721bda213ef9409
signal="child exited" source=agent
systemd-tmpfiles-clean.timer: Timer elapsed.
systemd-tmpfiles-clean.service: Trying to enqueue job systemd-tmpfiles-clean.service/start/replace
Added job systemd-tmpfiles-clean.service/start to transaction.
Pulling in system.slice/start from systemd-tmpfiles-clean.service/start
Added job system.slice/start to transaction.
Pulling in -.slice/start from system.slice/start
Added job -.slice/start to transaction.
Pulling in shutdown.target/stop from systemd-tmpfiles-clean.service/start
Added job shutdown.target/stop to transaction.
Found redundant job shutdown.target/stop, dropping from transaction.
Found redundant job -.slice/start, dropping from transaction.
Found redundant job system.slice/start, dropping from transaction.
systemd-tmpfiles-clean.service: Installed new job systemd-tmpfiles-clean.service/start as 38
systemd-tmpfiles-clean.service: Enqueued job systemd-tmpfiles-clean.service/start as 38
systemd-tmpfiles-clean.timer: Changed waiting -> running
systemd-tmpfiles-clean.service: Passing 0 fds to service
systemd-tmpfiles-clean.service: About to execute: /usr/bin/systemd-tmpfiles --clean
systemd-tmpfiles-clean.service: Forked /usr/bin/systemd-tmpfiles as 60
systemd-tmpfiles-clean.service: Changed dead -> start
systemd-tmpfiles-clean.service: Failed to connect stdout to the journal socket, ignoring: No such file or directory
systemd-tmpfiles-clean.service: Kernel keyring not supported, ignoring.
systemd-tmpfiles-clean.service: Executing: /usr/bin/systemd-tmpfiles --clean
Received SIGCHLD from PID 60 (systemd-tmpfile).
Child 60 (systemd-tmpfile) died (code=exited, status=0/SUCCESS)
systemd-tmpfiles-clean.service: Failed to read oom_kill field of memory.events cgroup attribute: No such file or directory
systemd-tmpfiles-clean.service: Child 60 belongs to systemd-tmpfiles-clean.service.
systemd-tmpfiles-clean.service: Main process exited, code=exited, status=0/SUCCESS
systemd-tmpfiles-clean.service: Succeeded.
systemd-tmpfiles-clean.service: Changed start -> dead
systemd-tmpfiles-clean.service: Job 38 systemd-tmpfiles-clean.service/start finished, result=done
systemd-tmpfiles-clean.timer: Got notified about unit deactivation.
systemd-tmpfiles-clean.timer: Monotonic timer elapses in 23h 59min 59.954725s.
systemd-tmpfiles-clean.timer: Changed running -> waiting
systemd-tmpfiles-clean.timer: Monotonic timer elapses in 23h 59min 59.950491s.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (14 by maintainers)

Most upvoted comments

Thanks for summarizing the status @jcvenegas I would add that it’s mandatory for us to find a way to increase the reproducibility as this will greatly facilitate further debugging.

@jcvenegas

To add on reproducing the ‘docker kill’ test failure, we can have all the debug options enabled in kata’s configuration.toml file while getting the test failure. The only problem was I can’t enable the serial console log from CLH, as enabling it would mask the test failure.

One more interesting observation to mention: reproducing the test failure is much faster on a QEMU-based VM hosted on my own dev machine, comparing to a Azure VM (where the CI is running on). The QEMU VM takes ~5 minutes to reproduce the test failure, while the Azure VM takes at least 2 hours.