node-problem-detector: status doesn't change when injecting log messages

I cannot get the node problem detector to change a node status by injecting messages.

I am using kubernetes 1.5.2, Ubuntu 16.04, kernel 4.4.0-51-generic.

I run the npd as a daemonset. I have attempted to get this to work with the npd as version 0.3.0 and 0.4.0. I start the npd with the default command, using /config/kernel-monitor.json because my nodes use journald.

I have /dev/kmsg mounted into the pod, and I echo expressions matching the regexs in the kernel-monitor.json to /dev/kmsg on the node. I can view the fake logs I’ve echoed to /dev/kmsg in the pod.

Steps to reproduce:

# as root on the node where your pod is running
echo "task umount.aufs:123 blocked for more than 120 seconds." >> /dev/kmsg
# I have verified that these logs show up in journalctl -k

# this should match the following permanent condition in /config/kernel-monitor.json
#	{
#		"type": "permanent",
#		"condition": "KernelDeadlock",
#		"reason": "AUFSUmountHung",
#		"pattern": "task umount\\.aufs:\\w+ blocked for more than \\w+ seconds\\."
#	},

# check the node status of the node where you ran this on
kubectl get node <node>
# status will still be Ready

# for further detail examine the json
kubectl get node <node> -o json | jq .status.conditions
# you will see that the KernelDeadlock condition is still "False"

# I would expect the general status to change to "KernelDeadlock"

If I am not testing this properly, could you please give a detailed breakdown of how to test the node problem detector is working properly for kernel logs AND docker logs?

I have also reproduced this behavior using a custom docker_monitor.json and having the systemd docker service write to the journald docker logs. I have still been unsuccessful in getting the node status to change.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 15 (7 by maintainers)

Most upvoted comments

I’ve just tried.

{
	"plugin": "journald",
		"pluginConfig": {
			"source": "docker"
		},
		"logPath": "/var/log/journal",
		"lookback": "5m",
		"bufferSize": 10,
		"source": "docker-monitor",
		"conditions": [
		{
			"type": "TestCondition",
			"reason": "TestReason",
			"message": "test reason"
		}
		],
		"rules": [
		{
			"type": "permanent",
			"condition": "TestCondition",
			"reason": "TestDockerIssue",
			"pattern": "time=.*level=debug msg=\"Calling GET /v1.23/images/json\""
		}
		]
}

Will generate a TestCondition for time="2017-06-14T21:43:46.395377174Z" level=debug msg="Calling GET /v1.23/images/json"

NPD log:

I0614 21:51:59.378357       1 log_monitor.go:123] New status generated: &{Source:docker-monitor Events:[] Conditions:[{Type:KernelDeadlock Status:true Transition:2017-06-14 21:46:54.146649 +0000 UTC Reason:RandomDockerIssue Message:time="2017-06-14T21:46:54.145774420Z" level=debug msg="Calling GET /v1.23/images/json"}]}

I started NPD inside a container:

docker run -v /var/log:/var/log -v ~/docker-monitor.json:/config/docker-monitor.json -v /etc/machine-id:/etc/machine-id -it --entrypoint=/node-problem-detector gcr.io/google_containers/node-problem-detector:v0.4.0 --alsologtostderr --apiserver-override=https://10.0.0.1?inClusterConfig=false --system-log-monitors=/config/docker-monitor.json

Note that the pattern you provide must be able to match to the end of last line (multi-line is supported).

Random-Liu on Jun 14, 2017