dotcom-rendering: Fragmented packets causing SSL Handshake to hang on IPv6

As reported to userhelp, a user unabled to downgrade to IPv4 is unable to complete an SSL handshake with Fastly.

Unable to reproduce locally, but the command that hangs is: openssl s_client -6 -host '2a04:4e42::367' -port 443

This adress comes from our dualstack:

> host www.theguardian.com
www.theguardian.com is an alias for dualstack.guardian.map.fastly.net.
dualstack.guardian.map.fastly.net has address 151.101.65.111
dualstack.guardian.map.fastly.net has address 151.101.129.111
dualstack.guardian.map.fastly.net has address 151.101.193.111
dualstack.guardian.map.fastly.net has address 151.101.1.111
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:200::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:400::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:600::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42::367

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 29 (11 by maintainers)

Most upvoted comments

I’m the original reporter of the problem to userhelp. Thanks for tracking this. Here’s the packet capture (gzip’ed pcapng file) showing what happened. theguardian.pcapng.gz

nbriggs on Oct 27, 2022

@nbriggs Excellent investigation!

I think I’m able to replicate this now to a degree by using ping and sending large payloads.

~$ sudo ifconfig eno1 mtu 1500 up
~$ ping6 -s 1500 -c 3 www.theguardian.com
PING www.theguardian.com(2a04:4e42:4::367 (2a04:4e42:4::367)) 1500 data bytes

--- www.theguardian.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2032ms

~$ ping6 -s 1300 -c 3 www.theguardian.com
PING www.theguardian.com(2a04:4e42:4::367 (2a04:4e42:4::367)) 1300 data bytes
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=1 ttl=54 time=0.737 ms
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=2 ttl=54 time=0.737 ms
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=3 ttl=54 time=0.790 ms

--- www.theguardian.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2019ms
rtt min/avg/max/mdev = 0.737/0.754/0.790/0.025 ms

~$ ping6 -s 1500 -c 3 www.apple.com
PING www.apple.com(g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca)) 1500 data bytes
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=1 ttl=56 time=173 ms
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=2 ttl=56 time=295 ms
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=3 ttl=56 time=376 ms

--- www.apple.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 172.926/281.450/376.070/83.513 ms

Your idea of packet fragmentation seems like it could quite likely be the cause! Thanks!

I’ll raise the issue with Fastly, I’m not sure if I can make the ticket public but if not I’ll relay any useful information here!

AshCorr on Oct 31, 2022

Yes… as I mentioned, my ISP provides IPv6 access through a 6RD deployment (https://en.wikipedia.org/wiki/IPv6_rapid_deployment), which is based on a 6to4 tunnel.

nbriggs on Nov 1, 2022

@AshCorr – I wonder if Fastly is engaging in a similar hack to what Cloudflare used to. https://blog.cloudflare.com/increasing-ipv6-mtu/

An MTU <= 1472 on my ethernet interface seems to be OK, higher than that and it gets slow to no response:

briggs@macrobiotic /tmp % sudo ifconfig en0 mtu 1474               
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 3.940 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
^C
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 18.990 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 4.007 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 3.957 total
briggs@macrobiotic /tmp % sudo ifconfig en0 mtu 1472
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 15% cpu 0.190 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.147 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.146 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.144 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.146 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.145 total

nbriggs on Oct 28, 2022

I’m running some more experiments, and I’m beginning to wonder if the problem is somewhere in my ISP’s (sonic.com) IPv6 network.

I just tried the same client via another ISP (comcast.net) and haven’t been able to reproduce the failure (100/100 success at 2s interval), yet on sonic’s network it took only 5 attempts to get a failure. Attached is the packet capture of the 5 attempts. Even the successful connections look pretty ugly, and the approx 4s delay that I experience when it does work is obvious in the trace.

I’m not sure what the next step in debugging this is – perhaps see if I can move the termination of the 6RD gateway off the ISP’s router onto something else in case it’s the router’s crap 6RD implementation.

sonic5-theguardian.pcapng.gz

nbriggs on Oct 27, 2022