antrea: IPFIX start and stop timestamps are wrong/too large

Describe the bug IPFIX start and stop timestamps are wrong/too large.

To Reproduce I simply installed a new Kubernetes lab environment, and installed Antrea.

Expected Start and Stop timestamps sent in the IPFIX record should be the actual start and stop time.

Actual behavior As seen in the PCAP output below, the timestamps are ~85 1/2 years in the future.

StartTime: Feb  7, 2106 07:28:01.000000000 CET
EndTime: Feb  7, 2106 07:28:01.000000000 CET

The hex value is… ff ff ff f1 88 6e 09 00

Versions: Please provide the following information:

  • Antrea v0.10.1
  • Kubernetes 1.19.3 (fresh install)
  • Container runtime: containerd 1.3.7
  • Linux kernel 5.4.0-51-generic (all nodes are Ubuntu 20.04 fresh install)
  • Open vSwitch (installed using apt)
~# modinfo openvswitch
filename:       /lib/modules/5.4.0-51-generic/kernel/net/openvswitch/openvswitch.ko
alias:          net-pf-16-proto-16-family-ovs_ct_limit
alias:          net-pf-16-proto-16-family-ovs_meter
alias:          net-pf-16-proto-16-family-ovs_packet
alias:          net-pf-16-proto-16-family-ovs_flow
alias:          net-pf-16-proto-16-family-ovs_vport
alias:          net-pf-16-proto-16-family-ovs_datapath
license:        GPL
description:    Open vSwitch switching datapath
srcversion:     7E9C4A3E7257126A845CF56
depends:        nf_conntrack,nf_nat,nf_conncount,libcrc32c,nf_defrag_ipv6,nsh
retpoline:      Y
intree:         Y
name:           openvswitch
vermagic:       5.4.0-51-generic SMP mod_unload
sig_id:         PKCS#7
signer:         Build time autogenerated kernel key
sig_key:        56:AA:1E:D4:49:F6:D7:44:76:6B:68:17:21:DC:0C:4D:B1:40:9F:BB
sig_hashalgo:   sha512
signature:      94:5E:58:BE:D0:77:E2:B3:C6:A8:70:F8:70:1D:21:2A:3F:FA:BD:FA:
		0D:79:30:F4:22:53:D9:4F:74:6A:AA:7E:41:BA:87:2F:AB:77:D3:12:
		88:73:BC:C5:06:4D:B0:74:F2:E2:86:31:C5:29:C0:B8:3F:35:47:0C:
		27:B5:92:72:0F:B3:4C:2B:68:2C:EA:FA:A1:B5:A6:60:43:2B:46:41:
		12:B2:05:0A:13:27:A5:88:75:28:F6:1A:36:3F:19:38:23:A9:C5:CD:
		42:78:8B:32:BA:21:4E:09:65:FA:81:28:A6:3A:2F:A9:69:F5:39:B8:
		B9:9A:49:C8:1B:CE:82:97:01:26:BB:55:DA:88:E9:62:7F:83:4A:0F:
		21:DC:3A:BF:97:90:09:3C:00:75:8F:98:33:EA:F8:48:F3:54:FC:89:
		FB:04:0A:6E:02:0A:2A:E0:BE:E7:96:27:17:76:83:7A:E9:EF:E3:33:
		7D:F5:7F:E5:FA:34:B1:33:58:B0:60:00:AA:15:66:0C:CB:E2:62:7F:
		F7:35:9B:16:EF:8A:5C:23:78:16:1F:73:F5:BE:0F:81:47:18:B7:E7:
		BA:F9:C6:99:74:4D:F4:22:E7:CA:C5:7C:8C:0C:7E:59:BA:83:3B:7E:
		03:AF:46:1A:F0:BE:1A:00:CA:13:FF:BA:A4:89:D4:72:3C:F2:99:53:
		9A:51:80:2E:70:B3:4C:B7:36:D5:68:1F:A6:E2:C0:C4:45:52:08:8A:
		0D:33:EC:B9:86:7D:1C:4F:D6:F0:97:AE:0C:BA:82:5C:69:20:88:F4:
		84:FA:7E:CB:6B:64:4E:5E:EE:FF:85:40:F3:36:2F:48:A0:8C:44:F8:
		2B:7D:02:AB:24:BB:39:8B:45:57:12:00:AB:96:FC:45:0A:B4:CB:6E:
		87:C8:EE:96:2A:81:08:BC:7F:0D:48:93:21:F9:DE:20:13:D5:75:B5:
		5E:74:F8:54:75:24:E8:3D:07:80:98:DA:17:34:6A:67:B0:F0:AA:59:
		67:6F:35:3F:B5:52:E9:DC:F7:06:3E:09:B4:BB:32:A4:59:81:33:24:
		00:83:D7:6A:C9:0A:E0:C8:7B:D0:8A:0F:82:9A:E3:70:88:9F:52:FB:
		44:49:3F:B5:64:D2:F5:04:A3:44:2E:23:3E:03:25:4B:19:E0:AB:5D:
		92:36:BC:E0:B4:56:64:FF:FC:B1:4A:0A:3B:EB:4F:FD:36:A1:2F:68:
		69:7D:F4:00:4F:B7:A5:8C:E4:6A:9F:E6:46:83:B8:F0:62:20:2E:A5:
		CF:0E:B1:92:59:5E:9C:B3:8C:B3:C0:30:03:66:95:EE:DC:43:20:F6:
		2E:0D:DF:6A:16:61:F2:02:B6:C0:5A:BF

Additional context Add any other context about the problem here, such as Antrea logs, kubelet logs, etc.

Cisco NetFlow/IPFIX
    Version: 10
    Length: 190
    Timestamp: Oct 20, 2020 17:55:28.000000000 CEST
        ExportTime: 1603209328
    FlowSequence: 0
    Observation Domain Id: 4036327851
    Set 1 [id=256] (1 flows)
        FlowSet Id: (Data) (256)
        FlowSet Length: 174
        [Template Frame: 4]
        Flow 1
            [Duration: 0.000000000 seconds (seconds)]
                StartTime: Feb  7, 2106 07:28:01.000000000 CET
                EndTime: Feb  7, 2106 07:28:01.000000000 CET
            SrcAddr: 10.244.0.2
            DstAddr: 192.168.8.11
            SrcPort: 45684
            DstPort: 6443
            Protocol: TCP (6)
            Permanent Packets: 0
            Permanent Octets: 0
            Packets: 0
            Octets: 0
            Permanent Packets: 0 (Reverse Type 86 PACKETS_TOTAL)
            Permanent Octets: 0 (Reverse Type 85 BYTES_TOTAL)
            Packets: 0 (Reverse Type 2 PKTS)
            Octets: 0 (Reverse Type 1 BYTES)
            Enterprise Private entry: (Arthur18) Type 101: Value (hex bytes): 63 6f 72 65 64 6e 73 2d 66 39 66 64 39 37 39 64 …
            Enterprise Private entry: (Arthur18) Type 100: Value (hex bytes): 6b 75 62 65 2d 73 79 73 74 65 6d
            Enterprise Private entry: (Arthur18) Type 104: Value (hex bytes): 6b 6d 61 73 74 65 72 31
            Enterprise Private entry: (Arthur18) Type 103: Value (hex bytes): 
            Enterprise Private entry: (Arthur18) Type 102: Value (hex bytes): 
            Enterprise Private entry: (Arthur18) Type 105: Value (hex bytes): 
            Enterprise Private entry: (Arthur18) Type 106: Value (hex bytes): 0a 60 00 01
            Enterprise Private entry: (Arthur18) Type 108: Value (hex bytes): 64 65 66 61 75 6c 74 2f 6b 75 62 65 72 6e 65 74 …

And the hex dump…

0000   00 0a 00 be 5f 8f 08 70 00 00 00 00 f0 95 79 ab   ...._..p......y.
0010   01 00 00 ae ff ff ff f1 88 6e 09 00 ff ff ff f1   .........n......
0020   88 6e 09 00 0a f4 00 02 c0 a8 08 0b b2 74 19 2b   .n...........t.+
0030   06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0040   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0050   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0060   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00   ................
0070   00 17 63 6f 72 65 64 6e 73 2d 66 39 66 64 39 37   ..coredns-f9fd97
0080   39 64 36 2d 76 67 34 34 78 0b 6b 75 62 65 2d 73   9d6-vg44x.kube-s
0090   79 73 74 65 6d 08 6b 6d 61 73 74 65 72 31 00 00   ystem.kmaster1..
00a0   00 0a 60 00 01 18 64 65 66 61 75 6c 74 2f 6b 75   ..`...default/ku
00b0   62 65 72 6e 65 74 65 73 3a 68 74 74 70 73         bernetes:https

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 28 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Every node in the cluster has the same value…

knode2:~# cat /proc/sys/net/netfilter/nf_conntrack_timestamp
1

@robcowart You are right. Just checked the code. We are getting the stoptime from conntrack table, instead we should record the stopTime as the time when the flow record is created (time.Now) from the discussion in this issue. https://github.com/vmware-tanzu/antrea/blob/master/pkg/agent/flowexporter/exporter/exporter.go#L252

@zyiou Is it possible for you to add this in one of the PRs you already opened? Thanks.

Most network devices send flow records based on one or more configurable timeout settings. While they will track a flow for a longer period of time, they will send flows more frequently. This is necessary for a number of reasons.

  • Many server-to-server TCP connections can live for weeks, months or longer (e.g. the connection between Kibana or Logstash to Elasticsearch). They essentially never end unless stopped or there is some error. Without a periodic update the user could literally wait months for the information. What good is the rest of their data until the picture is complete? Not much.

  • Now consider that the above long lived flow was a hacker exfiltrating data from your database. The flow record that could reveal their activity is never sent until the damage is already done, and you loose the opportunity to shut them down. Not good.

  • If all of the information about a flow record is included at the end of the flow. How do you distribute it over the life of the flow. All at the beginning? All at the end? Unfortunately most visualization tools, like Kibana, and even many purpose built flow applications, do not provide the ability to smooth the data into buckets over the life of the flow. Even if they did, each query would still have to look into the future to determine if any flows had start times within the current visualized window. This is very tricky at best, if not close to impossible. However when the device send regular updates about the flows it is much easier.

If I explained those well, you should see that it is very desirable to send information about a long-lived flow over the life of the flow. Most network devices will provide various options to control this. Some are simple:

inactive_timeout = 15 - export flow after it is inactive for 15 seconds. active_timeout = 60 - export flow every 60 seconds while it is active.

Others provide more granular control:

expiry-interval 10 - the interval between expiry checks, i.e. every 10 secs it is checked which flows are ready for export icmp 60 - export icmp flows every 60 seconds tcp-generic 60 - export tcp flows every 60 seconds tcp-fin 20 - export flow 20 seconds after observing a TCP finish flag. tcp-rst 20 - export flow 20 seconds after observing a TCP reset flag. udp 60 - export udp flows every 60 seconds flow-generic 60 - applies to any IP protocol not specified by other timeouts max-active-life 604800 - track no flows longer than 1 week (although it can pickup tracking again if the session is still active, usually with a new flow ID)

I explain this difference just to show the options that are common. The expectation is that it should be possible to at least provide for the former. The only detail I would add is that the inactive and active periods should be tracked per flow, not globally. For example, if the 60 second export period was global, i.e. every 60 seconds all active flows are exported. You can end up with 59 seconds of no data, and a flood of records from across the whole infrastructure simultaneously. This can easily overwhelm some collection systems, especially Linux systems with default networking kernel parameters. Tracking the inactive and active timeouts per flow will better distribute the load for the collecting system and provide more timely and accurate reporting. In 2020 I would say that it is generally accepted that 1-min is a good compromise between data granularity and overall volume of records generated, with short-lived flows being exported quickly due to the inactive timeout.

Now that we have established that records should be sent periodically over the lifetime of a network flow, the next question is which fields, or “information elements” (IEs), should be included. While there are both delta and total IEs for things like bytes and packets. Almost every vendor sends only the delta values, where the delta is the quantity since the previous record for this flow was exported (or since the flow started if sending the initial record).

Sending only delta values is usually not an issue as it is easy enough at query time to do a sum of the deltas, when a total is needed. I have seen a few examples of vendors sending the total values as well, but it isn’t really important. What should be avoided is sending only total values. That makes it very challenging to work with the data later, especially in combination with records form other sources that provide only delta values. There is nothing wrong with including the total values, but I would simply remove them and gain a bit of efficiency.

Regarding the start and end timestamps. I have to admit that the RFCs aren’t really clear here. My feeling is that for active flows flowEndSeconds should be set the time that the last measurement was taken. However I am going to spend some time going through my collection of PCAPs from different vendors and try to determine if there is any consensus best practice. I will get back to you on this.

The last thing I will mention is UDP vs. TCP. I noticed that in the Antrea repo examples that the flowexporter was configured to use TCP. However I set mine up to use UDP. Like many other network-related data sources (SNMP, syslog, etc.) netflow was designed to be sent via UDP. The logic of this is that establishing a TCP session has much more overhead and requires much more network traffic (especially in the reverse direction for ACKs and such) than UDP. During an outage, it is generally undesirable that the overhead of the management traffic attempting to re-establish session is competing with applications to do the same. This is so embedded in the networking mindset that many devices, as well as collectors, do not even support TCP. In fact IPFIX is the only flow standard to even specify optional support for TCP. Since the Antrea exporter supports both, this isn’t an issue. However you should be aware that if people follow the documented example config exactly, some of them might encounter issues because the collector doesn’t support TCP.

Hopefully this was helpful. Let me know if there are any questions that I can help answer.