dotcom-rendering: Fragmented packets causing SSL Handshake to hang on IPv6

As reported to userhelp, a user unabled to downgrade to IPv4 is unable to complete an SSL handshake with Fastly.

Unable to reproduce locally, but the command that hangs is: openssl s_client -6 -host '2a04:4e42::367' -port 443

This adress comes from our dualstack:

> host www.theguardian.com
www.theguardian.com is an alias for dualstack.guardian.map.fastly.net.
dualstack.guardian.map.fastly.net has address 151.101.65.111
dualstack.guardian.map.fastly.net has address 151.101.129.111
dualstack.guardian.map.fastly.net has address 151.101.193.111
dualstack.guardian.map.fastly.net has address 151.101.1.111
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:200::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:400::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42:600::367
dualstack.guardian.map.fastly.net has IPv6 address 2a04:4e42::367

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 29 (11 by maintainers)

Most upvoted comments

I’m the original reporter of the problem to userhelp. Thanks for tracking this. Here’s the packet capture (gzip’ed pcapng file) showing what happened. theguardian.pcapng.gz

@nbriggs Excellent investigation!

I think I’m able to replicate this now to a degree by using ping and sending large payloads.

~$ sudo ifconfig eno1 mtu 1500 up
~$ ping6 -s 1500 -c 3 www.theguardian.com
PING www.theguardian.com(2a04:4e42:4::367 (2a04:4e42:4::367)) 1500 data bytes

--- www.theguardian.com ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2032ms

~$ ping6 -s 1300 -c 3 www.theguardian.com
PING www.theguardian.com(2a04:4e42:4::367 (2a04:4e42:4::367)) 1300 data bytes
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=1 ttl=54 time=0.737 ms
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=2 ttl=54 time=0.737 ms
1308 bytes from 2a04:4e42:4::367 (2a04:4e42:4::367): icmp_seq=3 ttl=54 time=0.790 ms

--- www.theguardian.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2019ms
rtt min/avg/max/mdev = 0.737/0.754/0.790/0.025 ms

~$ ping6 -s 1500 -c 3 www.apple.com
PING www.apple.com(g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca)) 1500 data bytes
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=1 ttl=56 time=173 ms
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=2 ttl=56 time=295 ms
1508 bytes from g2a02-26f0-00a1-0589-0000-0000-0000-1aca.deploy.static.akamaitechnologies.com (2a02:26f0:a1:589::1aca): icmp_seq=3 ttl=56 time=376 ms

--- www.apple.com ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 172.926/281.450/376.070/83.513 ms

Your idea of packet fragmentation seems like it could quite likely be the cause! Thanks!

I’ll raise the issue with Fastly, I’m not sure if I can make the ticket public but if not I’ll relay any useful information here!

Yes… as I mentioned, my ISP provides IPv6 access through a 6RD deployment (https://en.wikipedia.org/wiki/IPv6_rapid_deployment), which is based on a 6to4 tunnel.

@AshCorr – I wonder if Fastly is engaging in a similar hack to what Cloudflare used to. https://blog.cloudflare.com/increasing-ipv6-mtu/

An MTU <= 1472 on my ethernet interface seems to be OK, higher than that and it gets slow to no response:

briggs@macrobiotic /tmp % sudo ifconfig en0 mtu 1474               
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 3.940 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
^C
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 18.990 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 4.007 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 0% cpu 3.957 total
briggs@macrobiotic /tmp % sudo ifconfig en0 mtu 1472
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 15% cpu 0.190 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.147 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.146 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.144 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.146 total
briggs@macrobiotic /tmp % time curl -6 https://www.theguardian.com/
curl -6 https://www.theguardian.com/  0.02s user 0.01s system 19% cpu 0.145 total

I’m running some more experiments, and I’m beginning to wonder if the problem is somewhere in my ISP’s (sonic.com) IPv6 network.

I just tried the same client via another ISP (comcast.net) and haven’t been able to reproduce the failure (100/100 success at 2s interval), yet on sonic’s network it took only 5 attempts to get a failure. Attached is the packet capture of the 5 attempts. Even the successful connections look pretty ugly, and the approx 4s delay that I experience when it does work is obvious in the trace.

I’m not sure what the next step in debugging this is – perhaps see if I can move the termination of the 6RD gateway off the ISP’s router onto something else in case it’s the router’s crap 6RD implementation.

sonic5-theguardian.pcapng.gz