istio: istio-proxy v1.8.0-alpha2 crashing on aggregate cluster
Bug description
Following creation of an EnvoyFilter and ServiceEntry, istio-injected pods are unable to start up due to istio-proxy
crashing with a segfault. Also note that this error seems indeterministic, you may have to run through the reproduction procedure a few times to trigger it.
[ ] Docs [X] Installation [ ] Networking [ ] Performance and Scalability [ ] Extensions and Telemetry [ ] Security [ ] Test and Release [X] User Experience [ ] Developer Infrastructure
Expected behavior Istio-proxy starts without crashing.
Steps to reproduce the bug
- Install istio v1.8.0-alpha2
- Deploy the bookinfo example application
- Configure Istio with a custom root certificate as described in this guide
- Create the following ServiceEntry and corresponding EnvoyFilter:
apiVersion: networking.istio.io/v1beta1
kind: ServiceEntry
metadata:
name: reviews-test
namespace: service-mesh-hub
spec:
addresses:
- 244.57.37.240
hosts:
- reviews-test.bookinfo.global
location: MESH_INTERNAL
ports:
- name: http
number: 9080
protocol: http
resolution: DNS
---
apiVersion: networking.istio.io/v1alpha3
kind: EnvoyFilter
metadata:
name: reviews-envoy-filter
namespace: istio-system
spec:
configPatches:
- applyTo: CLUSTER
match:
cluster:
name: outbound|9080||reviews-test.bookinfo.global
patch:
operation: REMOVE
- applyTo: CLUSTER
match:
cluster:
name: outbound|9080||reviews-test.bookinfo.global
patch:
operation: ADD
value:
cluster_type:
name: envoy.clusters.aggregate
typed_config:
'@type': type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.config.cluster.aggregate.v2alpha.ClusterConfig
value:
clusters:
- outbound|9080||reviews.bookinfo.svc.cluster.local
- outbound|9080||reviews.bookinfo.svc.remote-cluster.global
connect_timeout: 1s
lb_policy: CLUSTER_PROVIDED
name: outbound|9080||reviews-test.bookinfo.global
Version (include the output of istioctl version --remote
and kubectl version --short
and helm version
if you used Helm)
Istio: v1.8.0-alpha2
Istio-proxy image: docker.io/istio/proxyv2:1.8.0-alpha.2
How was Istio installed?
cat << EOF | istioctl manifest install -y --context "kind-${cluster}" -f -
apiVersion: install.istio.io/v1alpha1
kind: IstioOperator
metadata:
name: example-istiooperator
namespace: istio-system
spec:
profile: minimal
addonComponents:
istiocoredns:
enabled: true
components:
# Istio Gateway feature
ingressGateways:
- name: istio-ingressgateway
enabled: true
k8s:
env:
- name: ISTIO_META_ROUTER_MODE
value: "sni-dnat"
service:
type: NodePort
ports:
- port: 80
targetPort: 8080
name: http2
- port: 443
targetPort: 8443
name: https
- port: 15443
targetPort: 15443
name: tls
nodePort: ${port}
meshConfig:
enableAutoMtls: true
values:
global:
pilotCertProvider: kubernetes
podDNSSearchNamespaces:
- global
EOF
Environment where bug was observed (cloud vendor, OS, etc)
Kubernetes ENV: KinD v0.8.1
Kubernetes Version: 1.17
OS: OSX Catalina
istio-proxy log output
2020-11-04T21:00:15.617742Z info FLAG: --concurrency="2"
2020-11-04T21:00:15.617784Z info FLAG: --domain="bookinfo.svc.cluster.local"
2020-11-04T21:00:15.617878Z info FLAG: --help="false"
2020-11-04T21:00:15.617886Z info FLAG: --log_as_json="false"
2020-11-04T21:00:15.617898Z info FLAG: --log_caller=""
2020-11-04T21:00:15.617910Z info FLAG: --log_output_level="default:info"
2020-11-04T21:00:15.617921Z info FLAG: --log_rotate=""
2020-11-04T21:00:15.617949Z info FLAG: --log_rotate_max_age="30"
2020-11-04T21:00:15.617958Z info FLAG: --log_rotate_max_backups="1000"
2020-11-04T21:00:15.617973Z info FLAG: --log_rotate_max_size="104857600"
2020-11-04T21:00:15.617984Z info FLAG: --log_stacktrace_level="default:none"
2020-11-04T21:00:15.617999Z info FLAG: --log_target="[stdout]"
2020-11-04T21:00:15.618011Z info FLAG: --meshConfig="./etc/istio/config/mesh"
2020-11-04T21:00:15.618020Z info FLAG: --outlierLogPath=""
2020-11-04T21:00:15.618050Z info FLAG: --proxyComponentLogLevel="misc:error"
2020-11-04T21:00:15.618061Z info FLAG: --proxyLogLevel="warning"
2020-11-04T21:00:15.618073Z info FLAG: --serviceCluster="ratings.bookinfo"
2020-11-04T21:00:15.618084Z info FLAG: --stsPort="0"
2020-11-04T21:00:15.618094Z info FLAG: --templateFile=""
2020-11-04T21:00:15.618106Z info FLAG: --tokenManagerPlugin="GoogleTokenExchange"
2020-11-04T21:00:15.618123Z info Version 1.8.0-alpha.2-ec349ef2a8a0bc846a87c8c10e02511e4865c9cc-Clean
2020-11-04T21:00:15.618264Z info Obtained private IP [10.244.0.30 fe80::481:beff:fe68:b1a5]
2020-11-04T21:00:15.618358Z info Apply proxy config from env {"proxyMetadata":{"DNS_AGENT":""}}
2020-11-04T21:00:15.619009Z info Effective config: binaryPath: /usr/local/bin/envoy
concurrency: 2
configPath: ./etc/istio/proxy
controlPlaneAuthPolicy: MUTUAL_TLS
discoveryAddress: istiod.istio-system.svc:15012
drainDuration: 45s
envoyAccessLogService: {}
envoyMetricsService: {}
parentShutdownDuration: 60s
proxyAdminPort: 15000
proxyMetadata:
DNS_AGENT: ""
serviceCluster: ratings.bookinfo
statNameLength: 189
statusPort: 15020
terminationDrainDuration: 5s
tracing:
zipkin:
address: zipkin.istio-system:9411
2020-11-04T21:00:15.619084Z info Proxy role: &model.Proxy{RWMutex:sync.RWMutex{w:sync.Mutex{state:0, sema:0x0}, writerSem:0x0, readerSem:0x0, readerCount:0, readerWait:0}, Type:"sidecar", IPAddresses:[]string{"10.244.0.30", "fe80::481:beff:fe68:b1a5"}, ID:"ratings-v1-7466f6f858-ngm9x.bookinfo", Locality:(*envoy_config_core_v3.Locality)(nil), DNSDomain:"bookinfo.svc.cluster.local", ConfigNamespace:"", Metadata:(*model.NodeMetadata)(nil), SidecarScope:(*model.SidecarScope)(nil), PrevSidecarScope:(*model.SidecarScope)(nil), MergedGateway:(*model.MergedGateway)(nil), ServiceInstances:[]*model.ServiceInstance(nil), IstioVersion:(*model.IstioVersion)(nil), VerifiedIdentity:(*spiffe.Identity)(nil), ipv6Support:false, ipv4Support:false, GlobalUnicastIP:"", XdsResourceGenerator:model.XdsResourceGenerator(nil), WatchedResources:map[string]*model.WatchedResource(nil)}
2020-11-04T21:00:15.619101Z info JWT policy is first-party-jwt
2020-11-04T21:00:15.619133Z info PilotSAN []string{"istiod.istio-system.svc"}
2020-11-04T21:00:15.619185Z info sa.serverOptions.CAEndpoint == istiod.istio-system.svc:15012 Citadel
2020-11-04T21:00:15.619223Z info Using CA istiod.istio-system.svc:15012 cert with certs: ./var/run/secrets/kubernetes.io/serviceaccount/ca.crt
2020-11-04T21:00:15.619294Z info citadelclient Citadel client using custom root: istiod.istio-system.svc:15012 -----BEGIN CERTIFICATE-----
MIICyDCCAbCgAwIBAgIBADANBgkqhkiG9w0BAQsFADAVMRMwEQYDVQQDEwprdWJl
cm5ldGVzMB4XDTIwMTEwNDE4MTg0MFoXDTMwMTEwMjE4MTg0MFowFTETMBEGA1UE
AxMKa3ViZXJuZXRlczCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEBANcN
+nDNxSKceeAl4Feghw7sTxc6tNg2HZRONkqXbfrQL/vEIwwtrbiA84MqoiiAEADs
B3q18bh/e5WEU655p1nzbMvcchqTzwgpHSubUW7BsIOWmUN5MnsfEQwDALLfjvQA
AWDMJz0jQsJ+/x4V3tY4F14+mTsssW8SY8vFl5h1KVzTzEQ8MF/jgSnFAnebvhaw
JYDqDc60cI4VSiMoDx/kGbRWYlmMbG3A5uO2hmitN+mCoCnciRgq2FEI5LGYgDYu
tqK/rl78V350Zdj5dIwkJ8iFTVPIQaEwd/gVsxTdUYiJK+wNoOqgkht9UPfy2qrQ
HHsbUdpbXdUOlKDPRZ8CAwEAAaMjMCEwDgYDVR0PAQH/BAQDAgKkMA8GA1UdEwEB
/wQFMAMBAf8wDQYJKoZIhvcNAQELBQADggEBAER+0ygpXY0fYkpi15Cljvrw99An
IZNmbV0HwDLF+rq0XvNl2uFX9HTJFCXAskZRR4TN7+64A0oW5Kz3LIjWhSSmRkob
UsuHdSx7cmbyYuZNqkw4oOb8jD/J3pOV19qX+teZTxDi2vAx3UwE3l1DCI1R2KEt
gU/7clRsIrOGknwXJOK832ZewxoHeINvmes3MZ/klwRbsXiLOkz7xgQGWbXrcsfp
XFJrUZJ8jYudroKsw5Pt1IzwMogQ9ypSw5vgsmXkYIcrn6sz5sdMej8Dpf0wNHJo
FDGKyJ2FfNEUcazsoqYv08bRO5qqfEhPtmlDXl/yshqsGRCUMK43oHilsa4=
-----END CERTIFICATE-----
2020-11-04T21:00:15.644178Z info sds SDS gRPC server for workload UDS starts, listening on "./etc/istio/proxy/SDS"
2020-11-04T21:00:15.644222Z info xdsproxy Initializing with upstream address istiod.istio-system.svc:15012 and cluster Kubernetes
2020-11-04T21:00:15.644385Z info xdsproxy adding watcher for certificate ./var/run/secrets/kubernetes.io/serviceaccount/ca.crt
2020-11-04T21:00:15.644315Z info sds Start SDS grpc server
2020-11-04T21:00:15.644600Z info Starting proxy agent
2020-11-04T21:00:15.644661Z info Received new config, creating new Envoy epoch 0
2020-11-04T21:00:15.644644Z info Opening status port 15020
2020-11-04T21:00:15.644824Z info Epoch 0 starting
2020-11-04T21:00:15.652712Z info Envoy command: [-c etc/istio/proxy/envoy-rev0.json --restart-epoch 0 --drain-time-s 45 --parent-shutdown-time-s 60 --service-cluster ratings.bookinfo --service-node sidecar~10.244.0.30~ratings-v1-7466f6f858-ngm9x.bookinfo~bookinfo.svc.cluster.local --local-address-ip-version v4 --bootstrap-version 3 --log-format-prefix-with-location 0 --log-format %Y-%m-%dT%T.%fZ %l envoy %n %v -l warning --component-log-level misc:error --concurrency 2]
2020-11-04T21:00:15.684710Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2020-11-04T21:00:15.684984Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2020-11-04T21:00:15.685367Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2020-11-04T21:00:15.685509Z warning envoy runtime Unable to use runtime singleton for feature envoy.http.headermap.lazy_map_min_size
2020-11-04T21:00:15.711445Z info xdsproxy Envoy ADS stream established
2020-11-04T21:00:15.715022Z warning envoy main there is no configured limit to the number of allowed active connections. Set a limit via the runtime key overload.global_downstream_max_connections
2020-11-04T21:00:15.719983Z info xdsproxy connecting to upstream XDS server: istiod.istio-system.svc:15012
2020-11-04T21:00:15.742274Z info sds resource:default new connection
2020-11-04T21:00:15.742306Z info sds resource:ROOTCA new connection
2020-11-04T21:00:15.742362Z info sds Skipping waiting for gateway secret
2020-11-04T21:00:15.742378Z info sds Skipping waiting for gateway secret
2020-11-04T21:00:15.833695Z info cache Root cert has changed, start rotating root cert for SDS clients
2020-11-04T21:00:15.833838Z info cache GenerateSecret default
2020-11-04T21:00:15.834220Z info sds resource:default pushed key/cert pair to proxy
2020-11-04T21:00:15.943182Z info cache Loaded root cert from certificate ROOTCA
2020-11-04T21:00:15.943429Z info sds resource:ROOTCA pushed root cert to proxy
2020-11-04T21:00:15.988392Z warning envoy filter mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
2020-11-04T21:00:15.989310Z warning envoy filter mTLS PERMISSIVE mode is used, connection can be either plaintext or TLS, and client cert can be omitted. Please consider to upgrade to mTLS STRICT mode for more secure configuration that only allows TLS connection with client cert. See https://istio.io/docs/tasks/security/mtls-migration/
2020-11-04T21:00:15.996381Z critical envoy backtrace Caught Segmentation fault, suspect faulting address 0x17
2020-11-04T21:00:15.996413Z critical envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2020-11-04T21:00:15.996422Z critical envoy backtrace Envoy version: 211ca038383b61c9fb600758bf170aea5508800c/1.16.0/Clean/RELEASE/BoringSSL
2020-11-04T21:00:15.996661Z critical envoy backtrace #0: __restore_rt [0x7f0fdb3ec8a0]
2020-11-04T21:00:16.006971Z critical envoy backtrace #1: Envoy::Event::DispatcherImpl::runPostCallbacks() [0x564f361f1336]
2020-11-04T21:00:16.016383Z critical envoy backtrace #2: Envoy::Event::DispatcherImpl::run() [0x564f361f11cc]
2020-11-04T21:00:16.024630Z critical envoy backtrace #3: Envoy::Server::WorkerImpl::threadRoutine() [0x564f361e3c53]
2020-11-04T21:00:16.033115Z critical envoy backtrace #4: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x564f367ff8c3]
2020-11-04T21:00:16.033175Z critical envoy backtrace #5: start_thread [0x7f0fdb3e16db]
2020-11-04T21:00:16.037086Z error Epoch 0 exited with error: signal: segmentation fault
2020-11-04T21:00:16.037273Z info xdsproxy disconnected from XDS server: istiod.istio-system.svc:15012
2020-11-04T21:00:16.037344Z info No more active epochs, terminating
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 34 (31 by maintainers)
With a debug build I get basically the same thing as https://github.com/istio/istio/issues/28620#issuecomment-722658196 although it reproduces much more easily (I haven’t seen the proxy succeed to start up yet). I guess I should’ve paid more attention to the assertion there earlier though. The line is
which points to the assertion here. This function is called by the return value of
dataCallback
which is called byrunOnAllThreads
. The comment indataCallback
points to these comments, pointing to thethis
capture being a problem, as EItanya pointed out…It needs a debug build of the istio-proxy to get the detailed stack trace. So swap the binary with debug build or ASAN build, you should be able to get them.