cilium: Intermittent report of dropped logs from hubble.

Hello, we’ve been noticing some unusual behavior with cilium/hubble and we’re curious if anyone else has seen anything similar. Essentially, we’ve written network policies that allow traffic to a particular CIDR block/port, but we’re seeing intermittent drop logs from hubble with the log code 133 (policy interruption) where we know the traffic should be allowed (and usually is allowed). However, we haven’t actually actually noticed any service interruption due to those drops, so we’re unsure about whether these are real intermittent cilium drops or if hubble might be falsely reporting drops. Has anyone seen anything similar, whether the cause might be false reporting by hubble or real intermittent drops by cilium? We’re running cilium v1.8.6 on kubernetes 1.18.

here’s an example of a drop:

"attributes": {
    "source": {
        "namespace": "studio-live",
        "identity": 42150,
        "ID": 37,
        "labels": [
            "k8s:app.kubernetes.io/managed-by=spinnaker",
            "k8s:app.kubernetes.io/name=baas",
            "k8s:app=baas-config",
            "k8s:io.cilium.k8s.namespace.labels.field.cattle.io/projectId=p-v9rth",
            "k8s:io.cilium.k8s.policy.cluster=default",
            "k8s:io.cilium.k8s.policy.serviceaccount=default",
            "k8s:io.kubernetes.pod.namespace=studio-live",
            "k8s:studio-context=live"
        ],
        "pod_name": "baas-config-d84bc885f-8s564"
    },
    "IP": {
        "ipVersion": "IPv4",
        "destination": "10.8.2.59",
        "source": "10.8.1.121"
    },
    "l4": {
        "TCP": {
            "destination_port": 26257,
            "source_port": 48794,
            "flags": {
                "PSH": true,
                "ACK": true
            }
        }
    },
    "destination": {
        "identity": 2,
        "labels": [
            "reserved:world"
        ]
    },
    "node_name": "ip-10-8-1-11.us-east-2.compute.internal",
    "traffic_direction": "EGRESS",
    "ethernet": {
        "destination": "2e:cf:00:b3:44:3c",
        "source": "86:4f:19:2d:9b:ef"
    },
    "Type": "L3_L4",
    "event_type": {
        "sub_type": 133,
        "type": 1
    },
    "verdict": "DROPPED",
    "Summary": "TCP Flags: ACK, PSH",
    "time": "2021-03-19T18:08:08.011207109Z",
    "drop_reason": 133
}

policy:

apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  annotations:
    artifact.spinnaker.io/location: studio-live
    artifact.spinnaker.io/name: network-config
    artifact.spinnaker.io/type: kubernetes/CiliumNetworkPolicy.cilium.io
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"cilium.io/v2","kind":"CiliumNetworkPolicy", [...]}
    moniker.spinnaker.io/application: baas
    moniker.spinnaker.io/cluster: CiliumNetworkPolicy.cilium.io network-config
  creationTimestamp: "2021-01-24T15:04:42Z"
  generation: 4
  labels:
    app.kubernetes.io/managed-by: spinnaker
    app.kubernetes.io/name: baas
  managedFields:
  - apiVersion: cilium.io/v2
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:artifact.spinnaker.io/location: {}
          f:artifact.spinnaker.io/name: {}
          f:artifact.spinnaker.io/type: {}
          f:kubectl.kubernetes.io/last-applied-configuration: {}
          f:moniker.spinnaker.io/application: {}
          f:moniker.spinnaker.io/cluster: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
      f:spec:
        .: {}
        f:egress: {}
        f:endpointSelector:
          .: {}
          f:matchLabels:
            .: {}
            f:app: {}
    manager: kubectl
    operation: Update
    time: "2021-03-06T23:00:22Z"
  name: network-config
  namespace: studio-live
  resourceVersion: "64708762"
  selfLink: /apis/cilium.io/v2/namespaces/studio-live/ciliumnetworkpolicies/network-config
  uid: af4b7347-921f-436d-ab8f-102933da6bf7
spec:
  egress:
  - toCIDRSet:
    - cidr: 10.0.0.0/8
  - toCIDR:
    - 169.254.169.254/32
    - 184.32.0.0/16
  - toFQDNs:
    - matchName: bondtech-docker-local.jfrog.io
    - matchPattern: baas-*.*.*.rds.amazonaws.com
[...]
  - toPorts:
    - ports:
      - port: "53"
        protocol: ANY
      rules:
        dns:
        - matchPattern: '*'
    - ports:
      - port: "8125"
        protocol: UDP
    - ports:
      - port: "6000"
        protocol: TCP
      - port: "7000"
        protocol: TCP
  - toEndpoints:
    - matchLabels:
        app: baas-auth
    - matchLabels:
        k8s:app: cockroachdb
        k8s:io.kubernetes.pod.namespace: crdb-dev3
[...]
  endpointSelector:
    matchLabels:
      app: baas-config

How to reproduce the issue

Intermittent. This seems to happen randomly so we do not have a concrete way to reproduce at will. We have asked in the slack and after initially triaging with @pchaigno and @aditighag the recommendation was to gather a dump and post a bug report for further review. cilium-sysdump-prod2-20210324-012008.zip cilium-sysdump-prod3-20210324-012151.zip

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 29 (13 by maintainers)

Most upvoted comments

@joaoubaldo Could you please open a new issue with the same information as above plus a full dump of those Hubble drops (-o jsonpb)? A Cilium sysdump of the cluster would also help.

pchaigno on Mar 10, 2022

we are getting the hubble reports of drops, BUT the application is behaving as expected (like there are no drops), so this is leading us to believe something is not correct

If the drops are intermittent, they might not cause connectivity issues. That seems more likely than Hubble getting packet drop notifications out of nowhere 🙂

pchaigno on Mar 24, 2021