cilium: CI: Tests upgrade and downgrade from a Cilium stable image to master (Expect a 403 from app1-service)
k8s-1.8.K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master
Etcd appears unhealthy during upgrade which causes traffic to pass when it should be denied.
Stacktrace
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:409
Expect a 403 from app1-service
Expected command: kubectl exec -n default app2-cd69fd9f6-nlgs2 -- curl -s -D /dev/stderr --fail --connect-timeout 5 --max-time 8 http://app1-service/private -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
To have failed, but it was successful:
Exitcode: 0
Stdout:
{ 'val': 'this is private' }
time-> DNS: '0.004221(10.97.31.250)', Connect: '0.004305',Transfer '0.004970', total '0.004992'
Stderr:
HTTP/1.1 200 OK
Date: Fri, 08 Mar 2019 11:42:49 GMT
Server: Apache/2.4.25 (Unix)
Last-Modified: Mon, 27 Mar 2017 15:58:16 GMT
ETag: "1d-54bb86948d600"
Accept-Ranges: bytes
Content-Length: 29
/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/k8sT/Updates.go:272
Standard Error
STEP: Installing a cleaning state of Cilium
STEP: Installing kube-dns
STEP: Deploying etcd-operator
STEP: Cilium "v1.4" is installed and running
STEP: Performing Cilium preflight check
Cilium is not ready yet: status is unhealthy: cilium-agent 'cilium-vrg4b' is unhealthy: Exitcode: 1
Stdout:
KVStore: Failure Err: Not able to connect to any etcd endpoints
ContainerRuntime: Ok docker daemon: OK
Kubernetes: Ok 1.8 (v1.8.14) [linux/amd64]
Kubernetes APIs: ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
Cilium: Failure Kvstore service is not ready
NodeMonitor: Disabled
Cilium health daemon: Warning Get http:///var/run/cilium/health.sock/v1beta/hello: dial unix /var/run/cilium/health.sock: connect: no such file or directory
IPv4 address pool: 5/255 allocated from 10.10.1.0/24
IPv6 address pool: 4/65535 allocated from f00d::a0a:100:0:0/112
Controller Status: 25/25 healthy
Proxy Status: OK, ip 10.10.1.1, port-range 10000-20000
Stderr:
command terminated with exit code 1
843076e7_K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master.zip
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- daemon: Wait for new identities before restoring Previously, we would deserialize endpoints from json, then add these partially unvalidated objects into the endpointmanager, exposing them to other su... — committed to joestringer/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore When deserializing an endpoint from json, we previously didn't ensure that the security identity is properly restored. Other users of the security id... — committed to joestringer/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints If an endpoint is currently being restored, ensure that its state reflects this for the duration of the restoration, and avoid triggering regen... — committed to joestringer/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore When deserializing an endpoint from json, we previously didn't ensure that the security identity is properly restored. Other users of the security id... — committed to cilium/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints If an endpoint is currently being restored, ensure that its state reflects this for the duration of the restoration, and avoid triggering regen... — committed to cilium/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore [ upstream commit 85e25bcc22bdcc0163a7b581de14cd7658caf69d ] When deserializing an endpoint from json, we previously didn't ensure that the security... — committed to cilium/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints [ upstream commit 69b90d33381db757e4d35a8e5ef37e39d8217e10 ] If an endpoint is currently being restored, ensure that its state reflects this f... — committed to cilium/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore [ upstream commit 85e25bcc22bdcc0163a7b581de14cd7658caf69d ] When deserializing an endpoint from json, we previously didn't ensure that the security... — committed to cilium/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints [ upstream commit 69b90d33381db757e4d35a8e5ef37e39d8217e10 ] If an endpoint is currently being restored, ensure that its state reflects this f... — committed to cilium/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore [ upstream commit 85e25bcc22bdcc0163a7b581de14cd7658caf69d ] When deserializing an endpoint from json, we previously didn't ensure that the security... — committed to cilium/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints [ upstream commit 69b90d33381db757e4d35a8e5ef37e39d8217e10 ] If an endpoint is currently being restored, ensure that its state reflects this f... — committed to cilium/cilium by joestringer 5 years ago
- endpoint: Sanitize ep.SecurityIdentity on restore [ upstream commit 85e25bcc22bdcc0163a7b581de14cd7658caf69d ] When deserializing an endpoint from json, we previously didn't ensure that the security... — committed to cilium/cilium by joestringer 5 years ago
- endpointmanager: Avoid regenerating restoring endpoints [ upstream commit 69b90d33381db757e4d35a8e5ef37e39d8217e10 ] If an endpoint is currently being restored, ensure that its state reflects this f... — committed to cilium/cilium by joestringer 5 years ago
Pretty sure this is the bug, the
LabelArrayisn’t populated because we’re using the deserializedendpoint.SecurityIdentityconstructed from json, not the newly reallocated one:https://github.com/cilium/cilium/blob/0945caf800a578f15546dadd53d2aa2e1ee5f8c7/pkg/policy/repository.go#L648
Here’s where we fix it up, which happens after the first endpoint regeneration in the sysdump example:
https://github.com/cilium/cilium/blob/0945caf800a578f15546dadd53d2aa2e1ee5f8c7/daemon/state.go#L335
Unless something changed about the parallelization of identity cache updates during v1.5, this appears to go back to v1.4.
Shout out to @ianvernon for pointing me in this direction. 🎉