skupper: Skupper Router v0.4.0 Hard Crash malloc(): unsorted double linked list corrupted
Hello!
Just updated my clusters to utilize 0.4.0 of the site-controller as well as the new service-controller:0.4.0
Ran into some very interesting issues attempting to utilize my services, currently testing the HTTP endpoint manually, while I also have a test service running testing the TCP proxy. Here’s the logs from the router before it crashed.
2020-12-10 07:00:03.850112 +0000 ROUTER_CORE (info) [C77] Connection Closed
2020-12-10 07:00:04.753583 +0000 ROUTER_CORE (info) [C5][L181] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no
2020-12-10 07:00:05.144506 +0000 HTTP_ADAPTOR (info) [C1] Connection closed
2020-12-10 07:00:05.144563 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...
2020-12-10 07:00:05.802507 +0000 ROUTER_CORE (info) [C5][L186] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}
2020-12-10 07:00:05.859764 +0000 ROUTER_CORE (info) [C79] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=
2020-12-10 07:00:05.859868 +0000 TCP_ADAPTOR (info) [C79] Connecting to: 10.196.3.155:1024
2020-12-10 07:00:05.860130 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024
2020-12-10 07:00:05.860182 +0000 TCP_ADAPTOR (info) [C79] Connected
2020-12-10 07:00:05.860363 +0000 ROUTER_CORE (info) [C80] Connection Opened: dir=in host=10.196.3.155:52766 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-10 07:00:05.860457 +0000 ROUTER_CORE (info) [C79][L187] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}
2020-12-10 07:00:05.860482 +0000 ROUTER_CORE (info) [C80][L188] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}
2020-12-10 07:00:05.860498 +0000 ROUTER_CORE (info) [C80][L189] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}
2020-12-10 07:00:05.860556 +0000 TCP_ADAPTOR (info) [C79] Disconnected
2020-12-10 07:00:05.860623 +0000 ROUTER_CORE (info) [C79][L190] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.nQRyzDZbD_AkDXv expire:link}
2020-12-10 07:00:05.860635 +0000 ROUTER_CORE (info) [C79][L190] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no
2020-12-10 07:00:05.860930 +0000 ROUTER_CORE (info) [C79][L187] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no
2020-12-10 07:00:05.860950 +0000 ROUTER_CORE (info) [C79] Connection Closed
2020-12-10 07:00:06.763495 +0000 ROUTER_CORE (info) [C5][L186] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no
2020-12-10 07:00:07.645104 +0000 HTTP_ADAPTOR (info) [C1] Connection closed
2020-12-10 07:00:07.645592 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...
2020-12-10 07:00:07.801700 +0000 ROUTER_CORE (info) [C5][L191] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}
2020-12-10 07:00:07.862693 +0000 ROUTER_CORE (info) [C81] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=
2020-12-10 07:00:07.862758 +0000 TCP_ADAPTOR (info) [C81] Connecting to: 10.196.3.155:1024
2020-12-10 07:00:07.863047 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024
2020-12-10 07:00:07.863094 +0000 TCP_ADAPTOR (info) [C81] Connected
2020-12-10 07:00:07.863190 +0000 ROUTER_CORE (info) [C82] Connection Opened: dir=in host=10.196.3.155:52780 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-10 07:00:07.863322 +0000 ROUTER_CORE (info) [C81][L192] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}
2020-12-10 07:00:07.863349 +0000 ROUTER_CORE (info) [C82][L193] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}
2020-12-10 07:00:07.863366 +0000 ROUTER_CORE (info) [C82][L194] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}
2020-12-10 07:00:07.863408 +0000 TCP_ADAPTOR (info) [C81] Disconnected
2020-12-10 07:00:07.863525 +0000 ROUTER_CORE (info) [C81][L195] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.Bxht0NGTkcDBtc_ expire:link}
2020-12-10 07:00:07.863539 +0000 ROUTER_CORE (info) [C81][L195] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no
2020-12-10 07:00:07.863555 +0000 ROUTER_CORE (info) [C81][L192] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no
2020-12-10 07:00:07.863567 +0000 ROUTER_CORE (info) [C81] Connection Closed
2020-12-10 07:00:08.768780 +0000 ROUTER_CORE (info) [C5][L191] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no
2020-12-10 07:00:09.801661 +0000 ROUTER_CORE (info) [C5][L196] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}
2020-12-10 07:00:09.858965 +0000 ROUTER_CORE (info) [C83] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=
2020-12-10 07:00:09.859253 +0000 TCP_ADAPTOR (info) [C83] Connecting to: 10.196.3.155:1024
2020-12-10 07:00:09.859836 +0000 ROUTER_CORE (info) [C83][L197] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}
2020-12-10 07:00:09.860398 +0000 ROUTER_CORE (info) [C84] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=
2020-12-10 07:00:09.860772 +0000 TCP_ADAPTOR (info) [C84] Connecting to: 10.196.3.155:1024
2020-12-10 07:00:09.861488 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024
2020-12-10 07:00:09.861670 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024
2020-12-10 07:00:09.861841 +0000 TCP_ADAPTOR (info) [C83] Connected
2020-12-10 07:00:09.862037 +0000 TCP_ADAPTOR (info) [C83] Disconnected
2020-12-10 07:00:09.862225 +0000 TCP_ADAPTOR (info) [C84] Connected
2020-12-10 07:00:09.862330 +0000 ROUTER_CORE (info) [C85] Connection Opened: dir=in host=10.196.3.155:52794 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-10 07:00:09.862491 +0000 ROUTER_CORE (info) [C86] Connection Opened: dir=in host=10.196.3.155:52796 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=
2020-12-10 07:00:10.147989 +0000 HTTP_ADAPTOR (info) [C1] Connection closed
2020-12-10 07:00:10.148607 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...
malloc(): unsorted double linked list corrupted
Here’s how we currently configure skupper:
cloud hub
apiVersion: v1
kind: ConfigMap
metadata:
name: skupper-site
data:
cluster-local: "false"
console: "true"
console-authentication: internal
console-password: "barney"
console-user: "rubble"
edge: "false"
name: test-cloud
router-console: "true"
service-controller: "true"
service-sync: "true"
edge
apiVersion: v1
kind: ConfigMap
metadata:
name: skupper-site
data:
cluster-local: "false"
console: "true"
console-authentication: internal
console-password: "barney"
console-user: "rubble"
edge: "true"
name: test-edge
router-console: "true"
service-controller: "true"
service-sync: "true"
we have two services exposed:
Services exposed through Skupper:
cloud-api (http port 5443)
nats-cloud-gateway (tcp port 7422)
Before the (cloud hub) router crashed I hopped on the pod and ran qdstat -l
and noticed there were many links piling up for the http transfer. Here’s an example of them.
Router Links
type dir conn id id peer class addr phs cap pri undel unsett deliv presett psdrop acc rej rel mod delay rate stuck cred blkd
=======================================================================================================================================================================================================================
endpoint out 2 2 mobile nats-cloud-gateway 0 250 0 1 0 8 0 0 0 0 0 0 0 0 0 10 -
endpoint out 3 3 mobile 92f5bd9b-f921-4408-aa22-4ccd3f5f2c6b/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 3 4 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 4 5 mobile mc/$skupper-service-sync 0 250 0 0 0 5 0 0 5 0 0 0 0 0 0 10 -
endpoint in 4 6 mobile mc/$skupper-service-sync 0 250 0 0 0 3 0 0 3 0 0 0 0 0 0 250 -
endpoint in 9 15 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
edge-downlink out 9 16 edge test-edge-skupper-router-7f45bdfb7c-92pww 250 0 0 0 1 1 0 0 0 0 0 0 0 0 250 -
endpoint out 9 17 mobile _$qd.edge_addr_tracking 0 250 0 0 0 6 6 0 0 0 0 0 6 0 0 32 -
endpoint out 9 18 mobile d5f5e229-5b7e-4553-97d3-24591e1f9555/skupper-site-query 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 9 19 mobile mc/$skupper-service-sync 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint in 9 20 mobile mc/$skupper-service-sync 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint in 9 21 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:10
endpoint in 9 22 mobile $management 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 9 23 local temp.TRsrm472AwVpo_a 250 0 0 0 0 0 0 0 0 0 0 0 0 0 100 -
endpoint in 9 24 mobile _$qd.addr_lookup 0 250 0 0 0 15 15 0 0 0 0 0 0 1 0 32 -
endpoint out 9 25 local temp.8e4pdFWcbq2rs05 250 0 0 0 15 15 0 0 0 0 0 0 1 0 250 -
endpoint in 9 29 250 0 0 1 5 0 0 0 0 4 0 0 0 0 250 -
endpoint in 9 30 250 0 0 1 5 0 0 0 0 4 0 0 0 0 250 -
endpoint in 9 31 250 0 0 1 5 0 0 0 0 4 0 0 0 0 250 -
endpoint out 12 34 local temp.OTbglwINSsqSAAG 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 12 35 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:09
endpoint out 9 36 250 0 1 0 5 5 0 0 0 0 0 0 0 0 251 -
endpoint out 15 40 local temp.dfzQtbbcY8XWFx5 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 15 41 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:08
endpoint out 16 42 local temp.sNLLCkrGX97NSYa 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 16 43 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:08
endpoint out 21 51 local temp.4BJYXn5+JRV8m2i 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 21 52 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:07
endpoint out 9 56 250 0 1 0 3 3 0 0 0 0 0 0 0 0 251 -
endpoint out 22 57 local temp.4iYq0pWc7SU+PE2 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 22 58 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:07
endpoint out 24 61 local temp.X5LW6rXYd9F4IEm 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 24 62 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:06
endpoint out 27 67 local temp.pCKON1EaK8AGKhg 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 27 68 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:05
endpoint out 29 71 local temp.MFPWripcNzwrGNE 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 29 72 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:05
endpoint out 32 78 local temp.LFc8+CYtSXCpAIg 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 32 79 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:04
endpoint out 38 89 local temp._xi8ae4bvy7JK9h 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 38 90 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:03
endpoint out 9 93 250 0 0 0 1 1 0 0 0 0 0 0 0 0 250 -
endpoint out 39 94 local temp.GajzcZTJkGLXDke 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 39 95 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:03
endpoint out 40 96 local temp.fpRRu0WvFy1JQo9 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 40 97 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:03
endpoint in 9 102 mobile nats-cloud-gateway 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint out 43 103 mobile nats-cloud-gateway 0 250 0 1 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint out 44 104 mobile nats-cloud-gateway 0 250 0 1 0 0 0 0 0 0 0 0 0 0 0 10 -
endpoint in 43 105 edge test-edge-skupper-router-7f45bdfb7c-92pww 250 0 0 1 1 0 0 0 0 0 0 0 0 0 10 -
endpoint in 44 106 edge test-edge-skupper-router-7f45bdfb7c-92pww 250 0 0 1 1 0 0 0 0 0 0 0 0 0 10 -
endpoint out 46 107 local temp.LeygFKNf2tvM2Ua 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 46 108 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:01
endpoint out 45 109 local temp.2P3RBIU4kGgdhjJ 250 0 0 0 0 0 0 0 0 0 0 0 0 0 250 -
endpoint in 45 110 mobile cloud-api 0 250 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00:00:01
endpoint in 47 111 mobile $management 0 250 0 0 0 2 0 0 2 0 0 0 0 0 0 250 -
endpoint out 47 112 local temp.4cohluBlyWklnIs 250 0 0 0 1 1 0 0 0 0 0 0 0 0 1 -
skuclient version 0.4.0
transport version quay.io/skupper/qdrouterd:0.4 (sha256:037ec89c755a)
controller version quay.io/skupper/service-controller:0.4.0 (sha256:b5c96ec83369)
Be sure to let me know if there’s any other information you’d be interested to see.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 21 (10 by maintainers)
demo went well.
That would work perfectly well, and yes having them propagate to all services skupper controller creates seems reasonable as well
Or perhaps reversing that would offer a simpler solution. I.e. all annotations on the skupper-site configmap would be copied to both router and service-contoller, but there would be a special annotation, e.g. skupper.io/ignore-router-annotations which would take a list of keys that should be ignored and not copied. Likewise for the service-controller. That way in the simple case all you need to do is add annotations to the skupper-site configmap that initialises the site. Would that work for you? Would it be ok if the annotations by default were applied to all the skupper created deployments?