tailscale: Derper TLS handshake error in China: remote error: tls: internal error

What is the issue?

It is very strange, i build an custom derper server but it always randomly broken after some times of running. It usually be 1~6 days depends on usage. High pressure broken it sooner. even if the option ‘-verify-clients’ is enable, derper might broken within one hour.

Go version i used is 1.17.1 and derper server we build follow the official manual.

Then it was broken, syslog will has dozens of error logs like: Mar 7 11:59:52 localhost derper[268829]: 2022/03/07 11:59:52 http: TLS handshake error from 111.12.xx.xx:51574: remote error: tls: internal error

and after about 5-10 minus then will start to generated tons of error like: Mar 7 12:05:00 localhost derper[268829]: 2022/03/07 21:52:56 http: TLS handshake error from 42.90.xx.xx:34461: write tcp 172.xx.xx.xx:12341->42.90.xx.xx:34461: write: connection reset by peer

When the derper server broken, i can still see that derper marked as available in tailscale netcheck but actually it cannot be used, so i lost connection with many devices.

Steps to reproduce

Use letsencrypt to generate a cert. Build a derper server using that cert and with option: -stun -a :12345 -certmode manual -verify-clients --certdir /xxx/derper-certs -http-port -1 -hostname xx.xxx.com Wait hours or days until broken.

Are there any recent changes that introduced the issue?

No response

OS

Linux

OS version

centos8

Tailscale version

derper@v1.1.1-0.20220225000201 & tailscale@1.22.0

Bug report

derper has no bugreport option

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 33 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve submitted a fix. Please rebuild your derper binaries and try again.

It seems that the reason is that the domestic VPS has not been filed, and TLS blocking will be performed.

not with non-standard ports.

@frankli0324 can you share how you configure nginx to reverse proxy derper server?

I just reverse proxied the http probe server, not the stun server. so it’s as simple as

        proxy_pass http://127.0.0.1:10080;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

The Go security team points out that the cmd/derper code’s manual cert mode is at fault. It’s returning the same *tls.Certificate value on each call, but the GetCertificate wrapper around it appends to it on each call:

https://github.com/tailscale/tailscale/blob/9996d94b3c281e537bc3ef51a694c43ebee79c2c/cmd/derper/derper.go#L234-L241

Not only is that a data race, but eventually the cert gets so big that it can’t marshal anymore.

The normal LetsEncrypt mode doesn’t have this problem its GetCertificate call (the one we wrap) always returns a new *tls.Certificate value.

So the fix for that panic is fixing the manualCertManager’s TLSConfig method to return a GetCertificate that’s unique each time. Looks like this has been a problem since it was added in d8c5d00ecbaf2352f50ae3f26f795621a6e7972f. @SilverBut, interested in fixing?

貌似是国内vps没备案的原因,会进行tls阻断

“It seems that the reason is that the domestic VPS has not been filed, and TLS blocking will be performed.”