cilium: CI: Tests upgrade and downgrade from a Cilium stable image to master (Expect a 403 from app1-service)

k8s-1.8.K8sUpdates Tests upgrade and downgrade from a Cilium stable image to master

Etcd appears unhealthy during upgrade which causes traffic to pass when it should be denied.

Stacktrace

/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/ginkgo-ext/scopes.go:409
Expect a 403 from app1-service
Expected command: kubectl exec -n default app2-cd69fd9f6-nlgs2 -- curl -s -D /dev/stderr --fail --connect-timeout 5 --max-time 8 http://app1-service/private -w "time-> DNS: '%{time_namelookup}(%{remote_ip})', Connect: '%{time_connect}',Transfer '%{time_starttransfer}', total '%{time_total}'"
To have failed, but it was successful:
Exitcode: 0 
Stdout:
 	 { 'val': 'this is private' }
	 time-> DNS: '0.004221(10.97.31.250)', Connect: '0.004305',Transfer '0.004970', total '0.004992'
Stderr:
 	 HTTP/1.1 200 OK
	 Date: Fri, 08 Mar 2019 11:42:49 GMT
	 Server: Apache/2.4.25 (Unix)
	 Last-Modified: Mon, 27 Mar 2017 15:58:16 GMT
	 ETag: "1d-54bb86948d600"
	 Accept-Ranges: bytes
	 Content-Length: 29


/home/jenkins/workspace/Cilium-PR-Ginkgo-Tests-Validated/src/github.com/cilium/cilium/test/k8sT/Updates.go:272
Standard Error
STEP: Installing a cleaning state of Cilium
STEP: Installing kube-dns
STEP: Deploying etcd-operator
STEP: Cilium "v1.4" is installed and running
STEP: Performing Cilium preflight check
Cilium is not ready yet: status is unhealthy: cilium-agent 'cilium-vrg4b' is unhealthy: Exitcode: 1 
Stdout:
 	 KVStore:                Failure   Err: Not able to connect to any etcd endpoints
	 ContainerRuntime:       Ok        docker daemon: OK
	 Kubernetes:             Ok        1.8 (v1.8.14) [linux/amd64]
	 Kubernetes APIs:        ["CustomResourceDefinition", "cilium/v2::CiliumNetworkPolicy", "core/v1::Endpoint", "core/v1::Namespace", "core/v1::Node", "core/v1::Pods", "core/v1::Service", "networking.k8s.io/v1::NetworkPolicy"]
	 Cilium:                 Failure   Kvstore service is not ready
	 NodeMonitor:            Disabled
	 Cilium health daemon:   Warning   Get http:///var/run/cilium/health.sock/v1beta/hello: dial unix /var/run/cilium/health.sock: connect: no such file or directory
	 IPv4 address pool:      5/255 allocated from 10.10.1.0/24
	 IPv6 address pool:      4/65535 allocated from f00d::a0a:100:0:0/112
	 Controller Status:      25/25 healthy
	 Proxy Status:           OK, ip 10.10.1.1, port-range 10000-20000
	 
Stderr:
 	 command terminated with exit code 1

https://jenkins.cilium.io/job/Cilium-PR-Ginkgo-Tests-Validated/10628/testReport/junit/k8s-1/8/K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master/

843076e7_K8sUpdates_Tests_upgrade_and_downgrade_from_a_Cilium_stable_image_to_master.zip

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (16 by maintainers)

Commits related to this issue

Most upvoted comments

Pretty sure this is the bug, the LabelArray isn’t populated because we’re using the deserialized endpoint.SecurityIdentity constructed from json, not the newly reallocated one:

https://github.com/cilium/cilium/blob/0945caf800a578f15546dadd53d2aa2e1ee5f8c7/pkg/policy/repository.go#L648

Here’s where we fix it up, which happens after the first endpoint regeneration in the sysdump example:

https://github.com/cilium/cilium/blob/0945caf800a578f15546dadd53d2aa2e1ee5f8c7/daemon/state.go#L335

Unless something changed about the parallelization of identity cache updates during v1.5, this appears to go back to v1.4.

Shout out to @ianvernon for pointing me in this direction. 🎉