skupper: Skupper Router v0.4.0 Hard Crash malloc(): unsorted double linked list corrupted

Hello!

Just updated my clusters to utilize 0.4.0 of the site-controller as well as the new service-controller:0.4.0

Ran into some very interesting issues attempting to utilize my services, currently testing the HTTP endpoint manually, while I also have a test service running testing the TCP proxy. Here’s the logs from the router before it crashed.

2020-12-10 07:00:03.850112 +0000 ROUTER_CORE (info) [C77] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:04.753583 +0000 ROUTER_CORE (info) [C5][L181] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:05.144506 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:05.144563 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:05.802507 +0000 ROUTER_CORE (info) [C5][L186] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:05.859764 +0000 ROUTER_CORE (info) [C79] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:05.859868 +0000 TCP_ADAPTOR (info) [C79] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:05.860130 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:05.860182 +0000 TCP_ADAPTOR (info) [C79] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:05.860363 +0000 ROUTER_CORE (info) [C80] Connection Opened: dir=in host=10.196.3.155:52766 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:05.860457 +0000 ROUTER_CORE (info) [C79][L187] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:05.860482 +0000 ROUTER_CORE (info) [C80][L188] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}                                                                                                                                                                                                                                    
2020-12-10 07:00:05.860498 +0000 ROUTER_CORE (info) [C80][L189] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}                                                                                                                                                                                                                                       
2020-12-10 07:00:05.860556 +0000 TCP_ADAPTOR (info) [C79] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:05.860623 +0000 ROUTER_CORE (info) [C79][L190] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.nQRyzDZbD_AkDXv expire:link}                                                                                                                                                                     
2020-12-10 07:00:05.860635 +0000 ROUTER_CORE (info) [C79][L190] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:05.860930 +0000 ROUTER_CORE (info) [C79][L187] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:05.860950 +0000 ROUTER_CORE (info) [C79] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:06.763495 +0000 ROUTER_CORE (info) [C5][L186] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:07.645104 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:07.645592 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:07.801700 +0000 ROUTER_CORE (info) [C5][L191] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:07.862693 +0000 ROUTER_CORE (info) [C81] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:07.862758 +0000 TCP_ADAPTOR (info) [C81] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:07.863047 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:07.863094 +0000 TCP_ADAPTOR (info) [C81] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:07.863190 +0000 ROUTER_CORE (info) [C82] Connection Opened: dir=in host=10.196.3.155:52780 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:07.863322 +0000 ROUTER_CORE (info) [C81][L192] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:07.863349 +0000 ROUTER_CORE (info) [C82][L193] Link attached: dir=out source={(dyn)<none> expire:link} target={<none> expire:link}                                                                                                                                                                                                                                    
2020-12-10 07:00:07.863366 +0000 ROUTER_CORE (info) [C82][L194] Link attached: dir=in source={<none> expire:link} target={cloud-api expire:link}                                                                                                                                                                                                                                       
2020-12-10 07:00:07.863408 +0000 TCP_ADAPTOR (info) [C81] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:07.863525 +0000 ROUTER_CORE (info) [C81][L195] Link attached: dir=in source={<none> expire:link} target={amqp:/_edge/test-edge-skupper-router-7f45bdfb7c-92pww/temp.Bxht0NGTkcDBtc_ expire:link}                                                                                                                                                                     
2020-12-10 07:00:07.863539 +0000 ROUTER_CORE (info) [C81][L195] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:07.863555 +0000 ROUTER_CORE (info) [C81][L192] Link lost: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=1 delay10=0 blocked=no                                                                                                                                                                                                                              
2020-12-10 07:00:07.863567 +0000 ROUTER_CORE (info) [C81] Connection Closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:08.768780 +0000 ROUTER_CORE (info) [C5][L191] Link detached: del=0 presett=0 psdrop=0 acc=0 rej=0 rel=0 mod=0 delay1=0 delay10=0 blocked=no                                                                                                                                                                                                                           
2020-12-10 07:00:09.801661 +0000 ROUTER_CORE (info) [C5][L196] Link attached: dir=in source={<none> expire:sess} target={nats-cloud-gateway expire:link}                                                                                                                                                                                                                               
2020-12-10 07:00:09.858965 +0000 ROUTER_CORE (info) [C83] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:09.859253 +0000 TCP_ADAPTOR (info) [C83] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:09.859836 +0000 ROUTER_CORE (info) [C83][L197] Link attached: dir=out source={nats-cloud-gateway expire:link} target={<none> expire:link}                                                                                                                                                                                                                             
2020-12-10 07:00:09.860398 +0000 ROUTER_CORE (info) [C84] Connection Opened: dir=out host=10.196.3.155:1024 vhost= encrypted=no auth=no user= container_id=TcpAdaptor props=                                                                                                                                                                                                           
2020-12-10 07:00:09.860772 +0000 TCP_ADAPTOR (info) [C84] Connecting to: 10.196.3.155:1024                                                                                                                                                                                                                                                                                             
2020-12-10 07:00:09.861488 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:09.861670 +0000 HTTP_ADAPTOR (info) Accepting HTTP/1.x connection on 0.0.0.0:1024                                                                                                                                                                                                                                                                                     
2020-12-10 07:00:09.861841 +0000 TCP_ADAPTOR (info) [C83] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:09.862037 +0000 TCP_ADAPTOR (info) [C83] Disconnected                                                                                                                                                                                                                                                                                                                 
2020-12-10 07:00:09.862225 +0000 TCP_ADAPTOR (info) [C84] Connected                                                                                                                                                                                                                                                                                                                    
2020-12-10 07:00:09.862330 +0000 ROUTER_CORE (info) [C85] Connection Opened: dir=in host=10.196.3.155:52794 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:09.862491 +0000 ROUTER_CORE (info) [C86] Connection Opened: dir=in host=10.196.3.155:52796 vhost= encrypted=no auth=no user= container_id=HTTP/1.x Adaptor props=                                                                                                                                                                                                     
2020-12-10 07:00:10.147989 +0000 HTTP_ADAPTOR (info) [C1] Connection closed                                                                                                                                                                                                                                                                                                            
2020-12-10 07:00:10.148607 +0000 HTTP_ADAPTOR (info) [C1] Server not responding - disconnecting...                                                                                                                                                                                                                                                                                     
malloc(): unsorted double linked list corrupted

Here’s how we currently configure skupper: cloud hub

apiVersion: v1
kind: ConfigMap
metadata:
  name: skupper-site
data:
  cluster-local: "false"
  console: "true"
  console-authentication: internal
  console-password: "barney"
  console-user: "rubble"
  edge: "false"
  name: test-cloud
  router-console: "true"
  service-controller: "true"
  service-sync: "true"

edge

apiVersion: v1
kind: ConfigMap
metadata:
  name: skupper-site
data:
  cluster-local: "false"
  console: "true"
  console-authentication: internal
  console-password: "barney"
  console-user: "rubble"
  edge: "true"
  name: test-edge
  router-console: "true"
  service-controller: "true"
  service-sync: "true"

we have two services exposed:

Services exposed through Skupper:
    cloud-api (http port 5443)
    nats-cloud-gateway (tcp port 7422)

Before the (cloud hub) router crashed I hopped on the pod and ran qdstat -l and noticed there were many links piling up for the http transfer. Here’s an example of them.

Router Links
  type           dir  conn id  id   peer  class   addr                                                     phs  cap  pri  undel  unsett  deliv  presett  psdrop  acc  rej  rel  mod  delay  rate  stuck  cred  blkd
  =======================================================================================================================================================================================================================
  endpoint       out  2        2          mobile  nats-cloud-gateway                                       0    250  0    1      0       8      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  3        3          mobile  92f5bd9b-f921-4408-aa22-4ccd3f5f2c6b/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   3        4                                                                                250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  4        5          mobile  mc/$skupper-service-sync                                 0    250  0    0      0       5      0        0       5    0    0    0    0      0     0      10    -
  endpoint       in   4        6          mobile  mc/$skupper-service-sync                                 0    250  0    0      0       3      0        0       3    0    0    0    0      0     0      250   -
  endpoint       in   9        15                                                                               250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  edge-downlink  out  9        16         edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      0       1      1        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        17         mobile  _$qd.edge_addr_tracking                                  0    250  0    0      0       6      6        0       0    0    0    0    6      0     0      32    -
  endpoint       out  9        18         mobile  d5f5e229-5b7e-4553-97d3-24591e1f9555/skupper-site-query  0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        19         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       in   9        20         mobile  mc/$skupper-service-sync                                 0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       in   9        21         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:10
  endpoint       in   9        22         mobile  $management                                              0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  9        23         local   temp.TRsrm472AwVpo_a                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      100   -
  endpoint       in   9        24         mobile  _$qd.addr_lookup                                         0    250  0    0      0       15     15       0       0    0    0    0    0      1     0      32    -
  endpoint       out  9        25         local   temp.8e4pdFWcbq2rs05                                          250  0    0      0       15     15       0       0    0    0    0    0      1     0      250   -
  endpoint       in   9        29                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       in   9        30                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       in   9        31                                                                               250  0    0      1       5      0        0       0    0    4    0    0      0     0      250   -
  endpoint       out  12       34         local   temp.OTbglwINSsqSAAG                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   12       35         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:09
  endpoint       out  9        36                                                                               250  0    1      0       5      5        0       0    0    0    0    0      0     0      251   -
  endpoint       out  15       40         local   temp.dfzQtbbcY8XWFx5                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   15       41         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:08
  endpoint       out  16       42         local   temp.sNLLCkrGX97NSYa                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   16       43         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:08
  endpoint       out  21       51         local   temp.4BJYXn5+JRV8m2i                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   21       52         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:07
  endpoint       out  9        56                                                                               250  0    1      0       3      3        0       0    0    0    0    0      0     0      251   -
  endpoint       out  22       57         local   temp.4iYq0pWc7SU+PE2                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   22       58         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:07
  endpoint       out  24       61         local   temp.X5LW6rXYd9F4IEm                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   24       62         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:06
  endpoint       out  27       67         local   temp.pCKON1EaK8AGKhg                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   27       68         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:05
  endpoint       out  29       71         local   temp.MFPWripcNzwrGNE                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   29       72         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:05
  endpoint       out  32       78         local   temp.LFc8+CYtSXCpAIg                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   32       79         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:04
  endpoint       out  38       89         local   temp._xi8ae4bvy7JK9h                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   38       90         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       out  9        93                                                                               250  0    0      0       1      1        0       0    0    0    0    0      0     0      250   -
  endpoint       out  39       94         local   temp.GajzcZTJkGLXDke                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   39       95         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       out  40       96         local   temp.fpRRu0WvFy1JQo9                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   40       97         mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:03
  endpoint       in   9        102        mobile  nats-cloud-gateway                                       0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       out  43       103        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  44       104        mobile  nats-cloud-gateway                                       0    250  0    1      0       0      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   43       105        edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       in   44       106        edge    test-edge-skupper-router-7f45bdfb7c-92pww                     250  0    0      1       1      0        0       0    0    0    0    0      0     0      10    -
  endpoint       out  46       107        local   temp.LeygFKNf2tvM2Ua                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   46       108        mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:01
  endpoint       out  45       109        local   temp.2P3RBIU4kGgdhjJ                                          250  0    0      0       0      0        0       0    0    0    0    0      0     0      250   -
  endpoint       in   45       110        mobile  cloud-api                                                0    250  0    0      0       0      0        0       0    0    0    0    0      0     0      0     00:00:01
  endpoint       in   47       111        mobile  $management                                              0    250  0    0      0       2      0        0       2    0    0    0    0      0     0      250   -
  endpoint       out  47       112        local   temp.4cohluBlyWklnIs                                          250  0    0      0       1      1        0       0    0    0    0    0      0     0      1     -
skuclient version                 0.4.0
transport version              quay.io/skupper/qdrouterd:0.4 (sha256:037ec89c755a)
controller version             quay.io/skupper/service-controller:0.4.0 (sha256:b5c96ec83369)

Be sure to let me know if there’s any other information you’d be interested to see.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

demo went well.

That would work perfectly well, and yes having them propagate to all services skupper controller creates seems reasonable as well

Perhaps certain annotations on the skupper site could be copied to the router. E.g. there could be an annotation skupper.io/router-annotations that took a list of keys of other annotations to copy if present?

Or perhaps reversing that would offer a simpler solution. I.e. all annotations on the skupper-site configmap would be copied to both router and service-contoller, but there would be a special annotation, e.g. skupper.io/ignore-router-annotations which would take a list of keys that should be ignored and not copied. Likewise for the service-controller. That way in the simple case all you need to do is add annotations to the skupper-site configmap that initialises the site. Would that work for you? Would it be ok if the annotations by default were applied to all the skupper created deployments?