rancher: Job cattle-node-cleanup fails due to restrictive standard PSP
Rancher Server Setup
- Rancher version: 2.6.3
Information about the Cluster
- Kubernetes version: v1.21.7
- Cluster Type: Downstream
- Custom cluster / Provider: RKE1
Describe the bug
There is the periodic event in the cluster event log:
FailedCreate Job cattle-node-cleanup-xxx Error creating: pods "cattle-node-cleanup-xxx-" is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0]: Invalid value: "hostPath": hostPath volumes are not allowed to be used]
To Reproduce
- Remove a node from a downstream cluster
- Watch the job cattle-node-cleanup-xxx –> Recent events: Error creating: pods “cattle-node-cleanup-xxx” is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0]: Invalid value: “hostPath”: hostPath volumes are not allowed to be used]
Result Error creating: pods “cattle-node-cleanup-xxx” is forbidden: PodSecurityPolicy: unable to admit pod: [spec.volumes[0]: Invalid value: “hostPath”: hostPath volumes are not allowed to be used]
Expected Result
- Job cattle-node-cleanup-xxx should run without errors
- The job should not run in the default namespace, because it violates the CIS policy
Additional context I use a more restrictive standard PSP in the cluster which prohibits the usage of a hostPath volume. The YAML file of the job “cattle-node-cleanup-8nx54” shows, that no specific service account is used. So the default service account is used which is bound to my default PSP for the cluster. The job should use a specific service account to which any unrestricted PSP is bound.
SURE-3708
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 30 (9 by maintainers)
This doesn’t appear to deal at all with the Rancher object/chart itself, or its PSPs. This is the
cattle-node-cleanup
pod, which needs a service-account+PSP attached to it in order to function in a restricted environment: https://github.com/rancher/rancher/blob/release/v2.7/pkg/controllers/management/node/cleanup.go#L279-L280Or as mentioned above it could run in the system namespace, which is probably more appropriate. I don’t understand why the cattle system namespaces wouldn’t exist, even in rke1 standalone ?
@annablender You are right. The issue must have been solved by one of the latest updates. In 2.6.10 (and maybe former versions) I am not facing this issue. But apart from that, would you like to check, in which namespace the cleanup pod runs now? It was my main concern that it runs in default namespace, which was also against the CIS rules.
@annablender Here it is:
/backport 2023-Q2-v2.6x
Can we keep this targeted for 2.6? We have a lot of clients hitting this and it will be a while before they can update to 2.7.x
Thanks!
@rosskirkpat has requested some additional information from the customer via the internal support issue.
@luthermonson states
All our cleanup jobs across all of rancher (cluster/node) are in the default namespace as it's the only thing guaranteed to be there. This was a design decision made quite some time ago.
Linking internal support issue SURE-4308
@samjustus has pointed to https://github.com/rancher/rancher/commit/52da4437d2c6f7d299c17f0f7f5b241160011cb3#diff-ebfdd8ff867d7b543f1af5420f0db2d246ee3df99638fdd794f5259f7fdf36c2R4
Will investigate.