bcc: userspace program attached to eBPF SOCKET_FILTER does not receive every skb
I’ve followed the http_filter example, and am trying to count the bytes transferred to/from my webserver when files are downloaded via HTTP.
I successfully catch the packets at the start of the HTTP conversation (entirety of inbound HTTP request, and most packets comprising my outbound HTTP response). But frequently there are packets toward the end of my outbound HTTP response, that my packet filter doesn’t log.
I used http-parse-complete.c
@ 3d9b687
And I made my own, simple http-parse-seq.py
, which does just one thing: print the sequence number of every packet that it sees.
Expected: Every sequence number I see in Wireshark/tcpdump should be logged by my bcc program also.
Actual: later packets are often absent from my bcc program’s logs, yet are visible in Wireshark/tcpdump.
Repro:
# make a directory which our webserver will mount
mkdir -p htdocs
# let's get a large photo for our webserver to serve
curl -o htdocs/docker.jpg 'https://reterwebber.files.wordpress.com/2014/09/docker-wallpaper-grey.jpg'
# launch apache webserver serving our image at http://localhost:8080/docker.jpg
docker run -i --rm --name httpd -p 8080:80 -v "$PWD"/htdocs:/usr/local/apache2/htdocs:ro,delegated httpd
#== (meanwhile, in another terminal)
# download the bcc script
curl -o http-parse-complete.c 'https://raw.githubusercontent.com/iovisor/bcc/3d9b687/examples/networking/http_filter/http-parse-complete.c'
curl -o http-parse-seq.py 'https://gist.githubusercontent.com/Birch-san/e9f21b2dc996c4405cd423527a633b2d/raw/b91b16c5ae0705d4bc098d342c10bd0ed84f6e18/http-parse-seq.py'
# run the bcc script (logs sequence numbers of all HTTP packets inbound/outbound on docker0 interface)
sudo ./http-parse-seq.py -i docker0
#== (meanwhile, in another terminal)
# watch packets on docker0 interface
sudo tcpdump ip -Snnli docker0 -B 65536
#== (meanwhile, in another terminal)
# download photo from our website
curl -Is 'http://localhost:8080/docker.jpg'
I observed that my bcc script output the following sequence numbers:
2002486161
3240310653
3240317893
3240325133
3240335269
3240339613
3240361333
3240383053
3240404773
3240426493
3240427941
3240451109
3240474277
3240475725
3240493101
3240517717
Whereas tcpdump output the following:
15:55:12.840095 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [S], seq 2002486160, win 29200, options [mss 1460,sackOK,TS val 2846391946 ecr 0,nop,wscale 7], length 0
15:55:12.840118 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [S.], seq 3240310652, ack 2002486161, win 28960, options [mss 1460,sackOK,TS val 809068667 ecr 2846391946,nop,wscale 7], length 0
15:55:12.840132 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240310653, win 229, options [nop,nop,TS val 2846391946 ecr 809068667], length 0
15:55:12.840541 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [P.], seq 2002486161:2002486253, ack 3240310653, win 229, options [nop,nop,TS val 2846391947 ecr 809068667], length 92: HTTP: GET /docker.jpg?s=1 HTTP/1.1
15:55:12.840560 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], ack 2002486253, win 227, options [nop,nop,TS val 809068668 ecr 2846391947], length 0
15:55:12.842392 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240310653:3240317893, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391947], length 7240: HTTP: HTTP/1.1 200 OK
15:55:12.842426 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240317893, win 342, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.842437 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240317893:3240325133, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391947], length 7240: HTTP
15:55:12.842444 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240325133, win 455, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.842452 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240325133:3240335269, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391949], length 10136: HTTP
15:55:12.842458 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240335269, win 613, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.842462 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [P.], seq 3240335269:3240339613, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391949], length 4344: HTTP
15:55:12.842466 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240339613, win 681, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.842482 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240339613:3240361333, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391949], length 21720: HTTP
15:55:12.842490 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240361333, win 1020, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.842494 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [P.], seq 3240361333:3240383053, ack 2002486253, win 227, options [nop,nop,TS val 809068670 ecr 2846391949], length 21720: HTTP
15:55:12.842501 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240383053, win 1146, options [nop,nop,TS val 2846391949 ecr 809068670], length 0
15:55:12.843435 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [F.], seq 2002486253, ack 3240383053, win 1360, options [nop,nop,TS val 2846391950 ecr 809068670], length 0
15:55:12.843605 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240383053:3240404773, ack 2002486253, win 227, options [nop,nop,TS val 809068671 ecr 2846391949], length 21720: HTTP
15:55:12.843626 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240404773, win 1367, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843634 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240404773:3240426493, ack 2002486253, win 227, options [nop,nop,TS val 809068671 ecr 2846391949], length 21720: HTTP
15:55:12.843643 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240426493, win 1277, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843647 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240426493:3240427941, ack 2002486253, win 227, options [nop,nop,TS val 809068671 ecr 2846391949], length 1448: HTTP
15:55:12.843652 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240427941, win 1267, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843662 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240427941:3240451109, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 23168: HTTP
15:55:12.843674 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240451109, win 1171, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843679 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240451109:3240474277, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 23168: HTTP
15:55:12.843687 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240474277, win 1076, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843691 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240474277:3240475725, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 1448: HTTP
15:55:12.843696 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240475725, win 1065, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843700 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [P.], seq 3240475725:3240493101, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 17376: HTTP
15:55:12.843707 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240493101, win 992, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843737 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240493101:3240517717, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 24616: HTTP
15:55:12.843746 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240517717, win 893, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843756 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240517717:3240542333, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 24616: HTTP
15:55:12.843765 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240542333, win 791, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
15:55:12.843768 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240542333:3240543781, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 1448: HTTP
15:55:12.843775 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [.], seq 3240543781:3240569845, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 26064: HTTP
15:55:12.843787 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [P.], seq 3240569845:3240577797, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 7952: HTTP
15:55:12.843951 IP 172.17.0.2.80 > 172.17.0.1.54296: Flags [F.], seq 3240577797, ack 2002486254, win 227, options [nop,nop,TS val 809068671 ecr 2846391950], length 0
15:55:12.843962 IP 172.17.0.1.54296 > 172.17.0.2.80: Flags [.], ack 3240577798, win 632, options [nop,nop,TS val 2846391950 ecr 809068671], length 0
You can filter+parse the tcpdump to just show sequence numbers like so: awk '$8~/seq/{gsub(/,$/,"",$9);print $9}' tcpdump.txt > tcpdump-seq.txt
So, the final packet that my bcc script sees, has a sequence number 3240517717
. Yet tcpdump sees three additional HTTP packets outbound from my webserver after that: 3240517717, 3240542333, 3240543781, 3240569845
.
Am I doing something wrong? Or is it expected that http_filter would not see all HTTP packets on the interface?
Versions:
Ubuntu 18.04 LTS
bcc-tools: 0.6.0-1
/etc/apt/sources.list.d/iovisor.list:
deb https://repo.iovisor.org/apt/artful artful main
I wasn’t sure what the “max packet length on the docker0 interface” was, so I used the MTU of loopback (65536) as a generous upper bound.
I attach some packet captures, including tcpdump, wireshark (pcap, pcapng) and the output of my program.
EDIT: I’ve now updated to latest, but no improvement:
Ubuntu 18.04 LTS
libbcc: 0.6.1-1
python-bcc: 0.6.1-1
bcc-tools: 0.6.1-1
/etc/apt/sources.list.d/iovisor.list:
deb https://repo.iovisor.org/apt/bionic bionic main
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 20 (3 by maintainers)
I don’t find that error particularly suspicious. my own simple
http-parse-seq.py
doesn’t make any attempt to delete entries from the bpf_sessions map (hence I get no such error message), and is still a victim to packet loss.it’s hard to understand the range of circumstances in which
http-parse-complete.py
can throw that particular message. there’s only one way to insert a session into the map (http-parse-complete.c#L144
), whereashttp-parse-complete.py
has many ways to delete sessions from the map (including 30-sec timeout). so I’m more inclined to believe that a double-delete occurred (due to an edge case in very complex code) or that the map key was calculated incorrectly in the Python program. I don’t think it’s likely that things are disappearing from the map, or that it’s BPF’s fault.certainly, if we accept that the kernel discards skbs, then there are only these ways to explain why the client application still receives every skb: