ovn-kubernetes: pod creation blocks network policy creation
with the use of gatekeeper, it is possible to have admission rules that slow down the creation of the pods. These rules are called by OVNK while patching pods with annotations for IP and MAC addresses. During that time, no network policy can be created. This is a problem when restarting the ovnkube-master on a system with a lot of network policies, as each pod creation further delays the completion of the watchers. After a restart it can take hours for all the network policies to be installed, causing new pods to be started without network connectivity for hours. Contention seems to come from a WatchFactory.informers mutex.
This can be reproduced on ovn-kubernetes main branch:
1: installing gatekeeper policies
kubectl apply -f https://raw.githubusercontent.com/open-policy-agent/gatekeeper/master/deploy/gatekeeper.yaml
here the slow admission policies is simulated using a fake service running on 10.224.123.1:8443 (nc -l -p 8443 -k
on that server) as I did not find a “sleep” function in OPA doc…
apiVersion: externaldata.gatekeeper.sh/v1alpha1
kind: Provider
metadata:
name: dummy-provider
spec:
url: https://10.224.123.1:8443/validate
timeout: 10
caBundle: "LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUV2akNDQTZhZ0F3SUJBZ0lRQnRqWkJOVllRMGIyaWkrblZDSit4REFOQmdrcWhraUc5dzBCQVFzRkFEQmgKTVFzd0NRWURWUVFHRXdKVlV6RVZNQk1HQTFVRUNoTU1SR2xuYVVObGNuUWdTVzVqTVJrd0Z3WURWUVFMRXhCMwpkM2N1WkdsbmFXTmxjblF1WTI5dE1TQXdIZ1lEVlFRREV4ZEVhV2RwUTJWeWRDQkhiRzlpWVd3Z1VtOXZkQ0JEClFUQWVGdzB5TVRBME1UUXdNREF3TURCYUZ3MHpNVEEwTVRNeU16VTVOVGxhTUU4eEN6QUpCZ05WQkFZVEFsVlQKTVJVd0V3WURWUVFLRXd4RWFXZHBRMlZ5ZENCSmJtTXhLVEFuQmdOVkJBTVRJRVJwWjJsRFpYSjBJRlJNVXlCUwpVMEVnVTBoQk1qVTJJREl3TWpBZ1EwRXhNSUlCSWpBTkJna3Foa2lHOXcwQkFRRUZBQU9DQVE4QU1JSUJDZ0tDCkFRRUF3VXV6WlVkd3ZOMVBXTnZzbk8zRFp1VWZNUk5VclVwbVJoOHNDdXhrQitVdTNOeTVDaUR0MytQRTBKNmEKcVhvZGdvamxFVmJiSHA5WXdsSG5MRFFOTHRLUzRWYkw4WGxmczd1SHlpVURlNXBTUVdZUVlFOVhFMG53NkRkbgpnOS9uMDB0blRDSlJwdDhPbVJEdFYxRjBKdUo5eDhwaUxoTWJmeU9JSlZOdndUUllBSXVFLy9pK3AxaEpJbnVXCnJhS0lteFc4b0h6ZjZWR28xYkR0TitJMnRJSkxZclZKbXV6SFo5YmpQdlhqMWhKZVJQRy9jVUo5V0lRRGdMR0IKQWZyNXlqSzd0STRuaHlmRkszVFVxTmFYM3NOaytjck9VNkpXdkhnWGpra0RLYTc3U1Ura0Zibk84bHdaVjIxcgplYWNyb2ljZ0U3WFFQVURUSVRBSGsrcVo5UUlEQVFBQm80SUJnakNDQVg0d0VnWURWUjBUQVFIL0JBZ3dCZ0VCCi93SUJBREFkQmdOVkhRNEVGZ1FVdDJ1aTZxaXFoSXg1NnJUYUQ1aXl4WlYydWZRd0h3WURWUjBqQkJnd0ZvQVUKQTk1UU5WYlJUTHRtOEtQaUd4dkRsN0k5MFZVd0RnWURWUjBQQVFIL0JBUURBZ0dHTUIwR0ExVWRKUVFXTUJRRwpDQ3NHQVFVRkJ3TUJCZ2dyQmdFRkJRY0RBakIyQmdnckJnRUZCUWNCQVFScU1HZ3dKQVlJS3dZQkJRVUhNQUdHCkdHaDBkSEE2THk5dlkzTndMbVJwWjJsalpYSjBMbU52YlRCQUJnZ3JCZ0VGQlFjd0FvWTBhSFIwY0RvdkwyTmgKWTJWeWRITXVaR2xuYVdObGNuUXVZMjl0TDBScFoybERaWEowUjJ4dlltRnNVbTl2ZEVOQkxtTnlkREJDQmdOVgpIUjhFT3pBNU1EZWdOYUF6aGpGb2RIUndPaTh2WTNKc015NWthV2RwWTJWeWRDNWpiMjB2UkdsbmFVTmxjblJICmJHOWlZV3hTYjI5MFEwRXVZM0pzTUQwR0ExVWRJQVEyTURRd0N3WUpZSVpJQVliOWJBSUJNQWNHQldlQkRBRUIKTUFnR0JtZUJEQUVDQVRBSUJnWm5nUXdCQWdJd0NBWUdaNEVNQVFJRE1BMEdDU3FHU0liM0RRRUJDd1VBQTRJQgpBUUNBTXM1ZUM5MXVXZzBLcitIV2hNdkFqdnFGY08zYVhiTU05eXQxUVA2RkN2cnpNWGkzY0VzYWlWaTZnTDN6CmF4M3BmczhMdWxpY1dkU1EwLzFzL2RDWWJiZHhnbHZQYlF0YUNkQjczc1JEMkNxazNwNUJKbCs3ajVuTDNhN2gKcUcrZmgvNTB0eDhiSUt1eFQ4YjFaMTFkbXp6cC8ybjNZV3pXMmZQOU5zYXJBNGgyMGtzdWRZYmovTmhWZlNiQwpFWGZmUGdLMmZQT3JlM3FHTm0rNDk5aVRjYytHMzNNdytudXI3U3BaeUVLRU94RVhHbEx6eVE0VWZhSmJjbWU2CmNlMVhSMmJGdUFKS1pUUmVpOUFxUENDY1VabE01MUtlOTJzUkt3MlNmaDNvaXVzMkZrT0g2aXBqdjNVLzY5N0UKQTdzS1BQY3c3K3V2VFB5TE5oQnpQdk9rCi0tLS0tRU5EIENFUlRJRklDQVRFLS0tLS0K"
---
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: k8srequiredlabels
spec:
crd:
spec:
names:
kind: K8sRequiredLabels
validation:
# Schema for the `parameters` field
openAPIV3Schema:
type: object
properties:
labels:
type: array
items:
type: string
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package k8srequiredlabels
violation[{"msg": msg, "details": {"missing_labels": missing}}] {
provided := {label | input.review.object.metadata.labels[label]}
required := {label | label := input.parameters.labels[_]}
missing := required - provided
count(missing) > 0
response := external_data({"provider": "dummy-provider"})
msg := sprintf("you must provide labels: %v %v", [missing, response])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredLabels
metadata:
name: pod-must-have-label
spec:
enforcementAction: warn
match:
kinds:
- apiGroups: [""]
kinds: ["Pod"]
parameters:
labels: ["owner"]
2- preparing 2 definitions for a network policy, and applying them in a loop every second
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: toto
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
- namespaceSelector: {}
- podSelector: {}
ports:
- protocol: TCP
port: 6379
egress:
- to:
- ipBlock:
cidr: 10.0.0.0/24
ports:
- protocol: TCP
port: 5978
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: toto
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- ipBlock:
cidr: 172.18.0.0/16
except:
- 172.18.1.0/24
- namespaceSelector: {}
- podSelector: {}
ports:
- protocol: TCP
port: 6378
egress:
- to:
- ipBlock:
cidr: 10.2.0.0/24
ports:
- protocol: TCP
port: 5979
while sleep 1 ; do kubectl apply -f anetpol.yaml && sleep 1 && kubectl apply -f anetpol.2.yaml ; done
3- we start a new pod
kubectl run --rm --restart=Never -ti --image alpine sh
and observe the ovnkube-master logs:
I0218 10:33:06.233598 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:07.377462 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:08.509393 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:09.661520 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:10.803232 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:11.930228 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:13.060809 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:14.192461 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:14.382875 46 default_network_controller.go:581] Recording success event on pod default/sh
I0218 10:33:20.453991 46 default_network_controller.go:581] Recording success event on pod default/sh
I0218 10:33:20.454777 46 default_network_controller.go:581] Recording success event on pod default/sh
I0218 10:33:20.456370 46 default_network_controller.go:581] Recording success event on pod default/sh
I0218 10:33:20.456615 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
I0218 10:33:20.479506 46 default_network_controller.go:585] Recording success event on network policy toto/test-network-policy
During the 6 seconds it took to create our pod, no network policy was created.
Looking at a goroutine dump during that time, we see the goroutine doing the creation of the pod, patching the annotations of the pod:
goroutine 49 [select]:
golang.org/x/net/http2.(*ClientConn).RoundTrip(0xc000d34000, 0xc00144db00)
/home/vagrant/ovn-kubernetes/go-controller/vendor/golang.org/x/net/http2/transport.go:1200 +0x491
golang.org/x/net/http2.(*Transport).RoundTripOpt(0xc000322280, 0xc00144db00, {0x80?})
/home/vagrant/ovn-kubernetes/go-controller/vendor/golang.org/x/net/http2/transport.go:519 +0x1be
golang.org/x/net/http2.(*Transport).RoundTrip(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/golang.org/x/net/http2/transport.go:480
golang.org/x/net/http2.noDialH2RoundTripper.RoundTrip({0xc0000d2640?}, 0xc00144db00?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/golang.org/x/net/http2/transport.go:3020 +0x1b
net/http.(*Transport).roundTrip(0xc0000d2640, 0xc00144d900)
/usr/lib/golang/src/net/http/transport.go:548 +0x3ca
net/http.(*Transport).RoundTrip(0x1d3baa0?, 0xc0014ade30?)
/usr/lib/golang/src/net/http/roundtrip.go:17 +0x19
k8s.io/client-go/transport.(*bearerAuthRoundTripper).RoundTrip(0xc000234030, 0xc00144d800)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/transport/round_trippers.go:317 +0x3e5
k8s.io/client-go/transport.(*userAgentRoundTripper).RoundTrip(0xc0001dc4a0, 0xc00144d500)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/transport/round_trippers.go:168 +0x350
net/http.send(0xc00144d500, {0x219cc20, 0xc0001dc4a0}, {0x8?, 0x1e45660?, 0x0?})
/usr/lib/golang/src/net/http/client.go:252 +0x5f7
net/http.(*Client).send(0xc000234060, 0xc00144d500, {0x30?, 0x1c84500?, 0x0?})
/usr/lib/golang/src/net/http/client.go:176 +0x9b
net/http.(*Client).do(0xc000234060, 0xc00144d500)
/usr/lib/golang/src/net/http/client.go:716 +0x8fb
net/http.(*Client).Do(...)
/usr/lib/golang/src/net/http/client.go:582
k8s.io/client-go/rest.(*Request).request(0xc00144cf00, {0x21b5850, 0xc000058050}, 0x1?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/rest/request.go:883 +0x39b
k8s.io/client-go/rest.(*Request).Do(0xc00144cf00, {0x21b5850, 0xc000058050})
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/rest/request.go:924 +0xc9
k8s.io/client-go/kubernetes/typed/core/v1.(*pods).Update(0xc0015d04e0, {0x21b5850, 0xc000058050}, 0xc000074c00, {{{0x0, 0x0}, {0x0, 0x0}}, {0x0, 0x0, ...}, ...})
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/kubernetes/typed/core/v1/pod.go:141 +0x174
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/kube.(*Kube).UpdatePod(0xc000da77c0, 0xc000074c00)
/home/vagrant/ovn-kubernetes/go-controller/pkg/kube/kube.go:290 +0x1a7
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*BaseNetworkController).updatePodAnnotationWithRetry.func1()
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/base_network_controller_pods.go:694 +0x118
k8s.io/client-go/util/retry.OnError.func1()
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/util/retry/util.go:51 +0x33
k8s.io/apimachinery/pkg/util/wait.ConditionFunc.WithContext.func1({0x1c82f80, 0x43ba01})
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:222 +0x1b
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtectionWithContext({0x21b5850?, 0xc000058050?}, 0xc00160d630?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:235 +0x57
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection(0x9bcf31?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:228 +0x39
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff({0x2faf080, 0x4014000000000000, 0x3fb999999999999a, 0x1, 0x0}, 0x40e0e7?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:423 +0x5f
k8s.io/client-go/util/retry.OnError({0x989680, 0x4014000000000000, 0x3fb999999999999a, 0x2, 0x0}, 0x1fdc2c0, 0xc000fdef90)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/util/retry/util.go:50 +0xf1
k8s.io/client-go/util/retry.RetryOnConflict(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/util/retry/util.go:104
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*BaseNetworkController).updatePodAnnotationWithRetry(0xc00038a400?, 0xc000f77400?, 0xc0015dd3e0?, {0x1eaa64e?, 0x7?})
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/base_network_controller_pods.go:682 +0x14b
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*BaseNetworkController).addLogicalPortToNetwork(0xc00038a400, 0xc000f77400, {0x1eaa64e, 0x7}, 0x0)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/base_network_controller_pods.go:643 +0x1706
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).addLogicalPort(0xc00038a400, 0xc000f77400)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/pods.go:172 +0x1e5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).ensurePod(0x495067?, 0xc000f76c00, 0xc000f77400, 0x1)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/ovn.go:142 +0x645
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*defaultNetworkControllerEventHandler).UpdateResource(0xc0004fb040, {0x1e7d940?, 0xc000f76c00?}, {0x1e7d940?, 0xc000f77400?}, 0x0)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/default_network_controller.go:763 +0x15a8
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func2.2({0xc00099ef80, 0xa})
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:622 +0x678
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).DoWithLock(0xc00047bf90, {0xc00099ef80, 0xa}, 0xc000b41be8)
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:110 +0xc5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func2({0x1e7d940, 0xc000f76c00}, {0x1e7d940, 0xc000f77400})
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:562 +0xa96
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate({0xc0010f2040?, {0x21b5500?, 0xc00089a3a8?}}, {0x1e7d940, 0xc000f76c00}, {0x1e7d940, 0xc000f77400})
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:274 +0xe2
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*Handler).OnUpdate(...)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:55
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).newFederatedQueuedHandler.func2.1.1(0xc000722b70)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:350 +0x119
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachQueuedHandler(0xc00026c5a0, 0xc001821ee8)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:110 +0x144
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).newFederatedQueuedHandler.func2.1(0xc000fded50)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:341 +0x1cf
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).processEvents(0x0?, 0xc0003851a0, 0xc00009e180)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:222 +0x7e
created by github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.newQueuedInformer
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:507 +0xcb
in parallel, the goroutine creating a netpol:
goroutine 159 [sync.Mutex.Lock]:
sync.runtime_SemacquireMutex(0x38?, 0x20?, 0x4046f4?)
/usr/lib/golang/src/runtime/sema.go:77 +0x26
sync.(*Mutex).lockSlow(0xc00026c5a0)
/usr/lib/golang/src/sync/mutex.go:171 +0x165
sync.(*Mutex).Lock(...)
/usr/lib/golang/src/sync/mutex.go:90
sync.(*RWMutex).Lock(0x1bd2740?)
/usr/lib/golang/src/sync/rwmutex.go:147 +0x36
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).addHandler(0xc00018cee0, {0x21cf2d0, 0x1e7d940}, {0xc0014f3cf0, 0x4}, {0x21bcd50?, 0xc00045be90}, {0x21b5500, 0xc00045bed8}, 0xc00045bef0, ...)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/factory.go:553 +0x21d
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).AddFilteredPodHandler(...)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/factory.go:595
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*WatchFactory).GetResourceHandlerFunc.func5({0xc0014f3cf0?, 0x21cf2d0?}, {0x21bcd50?, 0xc00045be90?}, {0x21b5500?, 0xc00045bed8?}, 0x193cefb?)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/factory.go:491 +0x75
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered(0xc00121b900, {0xc0014f3cf0, 0x4}, {0x21bcd50, 0xc00045be90})
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:447 +0x26a
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).addPeerPodHandler(0xc00038a400, 0xc00017ad20?, 0xc001312900, 0xc0013d7b00, {0xc0014f3cf0, 0x4})
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/policy.go:1465 +0x19e
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).createNetworkPolicy.func1({0xc000ab80a8, 0x18})
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/policy.go:1171 +0x1851
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/syncmap.(*SyncMap[...]).DoWithLock(0x21c4580, {0xc000ab80a8, 0x18}, 0xc00161f3f0)
/home/vagrant/ovn-kubernetes/go-controller/pkg/syncmap/syncmap.go:168 +0xd4
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).createNetworkPolicy(0xc00038a400, 0xc001189520, 0xc00161f530)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/policy.go:997 +0x155
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*DefaultNetworkController).addNetworkPolicy(0xc00038a400, 0xc001189520)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/policy.go:1216 +0x2c8
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/ovn.(*defaultNetworkControllerEventHandler).AddResource(0xc0004fb080, {0x1e80860?, 0xc001189520?}, 0x0)
/home/vagrant/ovn-kubernetes/go-controller/pkg/ovn/default_network_controller.go:627 +0x1f0
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func2.2({0xc0016d0c78, 0x18})
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:636 +0x8fc
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).DoWithLock(0xc000d7e000, {0xc0016d0c78, 0x18}, 0xc00161fa58)
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:110 +0xc5
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/retry.(*RetryFramework).WatchResourceFiltered.func2({0x1e80860, 0xc001528d00}, {0x1e80860, 0xc001189520})
/home/vagrant/ovn-kubernetes/go-controller/pkg/retry/obj_retry.go:562 +0xa96
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.FilteringResourceEventHandler.OnUpdate({0xc000622300?, {0x21b5500?, 0xc000674438?}}, {0x1e80860, 0xc001528d00}, {0x1e80860, 0xc001189520})
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:274 +0xe2
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*Handler).OnUpdate(...)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:55
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).newFederatedHandler.func2.1(0xc0006791a0)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:400 +0x125
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).forEachHandler(0xc00026c6c0, {0x1e80860?, 0xc001189520?}, 0xc00161fd58)
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:138 +0x2c3
github.com/ovn-org/ovn-kubernetes/go-controller/pkg/factory.(*informer).newFederatedHandler.func2({0x1e80860, 0xc001528d00}, {0x1e80860, 0xc001189520})
/home/vagrant/ovn-kubernetes/go-controller/pkg/factory/handler.go:391 +0x15b
k8s.io/client-go/tools/cache.ResourceEventHandlerFuncs.OnUpdate(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/controller.go:239
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:816 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0x30?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:157 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc000f33738?, {0x219c560, 0xc000f41d70}, 0x1, 0xc000fca120)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:158 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0x19cd8bb?, 0x3b9aca00, 0x0, 0x98?, 0xc000f33788?)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:135 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:92
k8s.io/client-go/tools/cache.(*processorListener).run(0xc000322700)
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/client-go/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:75 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
/home/vagrant/ovn-kubernetes/go-controller/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:73 +0x85
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (8 by maintainers)
now, so the improvements are great!
This issue as described (slow admission webhook slows down the ovnkube-master restart) is not fixed, but let’s close this ticket anyway as the netpol watcher creation is now much faster. Thanks a ton for the multiple fixes!
wow, that #3329 is quite amazing.
with OCP 4.10, I counted 700k addresses in 10k address_set. with the main branch I had around 200k addresses in 10k address_set (as IPs for services and nodes are not selected anymore - due to https://github.com/openshift/ovn-kubernetes/commit/6cf5c612602429edf93a5dcb9d6818db795bd2f7 ?) Now testing again I count 4k addresses in 700 address_set.
In term of boot time it is down from 10 minutes to 7min 30, it’s very nice. I think improvements with the ACL indexes will divide this by 2. Can’t wait to test that 😄
thanks for the data! That makes sense, we are actually working on that part https://github.com/ovn-org/ovn-kubernetes/pull/3334 We can profile the startup a bit more, if you just share the setup you use (maybe test yamls?), and the startup time you see, that we can try to improve. Also, we have just merged another perf improvement for netpol https://github.com/ovn-org/ovn-kubernetes/pull/3329, which hopefully will make the performance somewhat better (just fyi, in case you see a bit different numbers with the latest version)
so I have a way to reproduce this particular ticket in kind (starting pods while ovnkube-master sync, validating the pod start up breaks the sync). Unfortunately reproducing a super slow start up is not trivial. I can create all my netpol in kind and it takes 5 min (ovnkube_master_sync_duration_seconds{resource_name=“network policy”} 327.935185747)
however, this does not account for the close to million addresses in address_set that are added when pods are running/ I need to find a way to somehow simulate that many addresses (corresponding to pods selected through our netpols), on the 2-worker kind cluster.
This is not really what I opened this ticket for, so I will put more info into the case.