tailscale: Derper TLS handshake error in China: remote error: tls: internal error
What is the issue?
It is very strange, i build an custom derper server but it always randomly broken after some times of running. It usually be 1~6 days depends on usage. High pressure broken it sooner. even if the option ‘-verify-clients’ is enable, derper might broken within one hour.
Go version i used is 1.17.1 and derper server we build follow the official manual.
Then it was broken, syslog will has dozens of error logs like:
Mar  7 11:59:52 localhost derper[268829]: 2022/03/07 11:59:52 http: TLS handshake error from 111.12.xx.xx:51574: remote error: tls: internal error
and after about 5-10 minus then will start to generated tons of error like:
Mar  7 12:05:00 localhost derper[268829]: 2022/03/07 21:52:56 http: TLS handshake error from 42.90.xx.xx:34461: write tcp 172.xx.xx.xx:12341->42.90.xx.xx:34461: write: connection reset by peer
When the derper server broken, i can still see that derper marked as available in tailscale netcheck but actually it cannot be used, so i lost connection with many devices.
Steps to reproduce
Use letsencrypt to generate a cert.
Build a derper server using that cert and with option: -stun -a :12345 -certmode manual -verify-clients --certdir /xxx/derper-certs -http-port -1 -hostname xx.xxx.com
Wait hours or days until broken.
Are there any recent changes that introduced the issue?
No response
OS
Linux
OS version
centos8
Tailscale version
derper@v1.1.1-0.20220225000201 & tailscale@1.22.0
Bug report
derper has no bugreport option
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 33 (9 by maintainers)
Commits related to this issue
- cmd/derper: fix data race & server panic in manual cert mode Fixes #4082 Change-Id: I400a64001c3c58899bb570b759b08e745abc0be1 Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com> — committed to tailscale/tailscale by bradfitz 2 years ago
- cmd/derper: fix data race & server panic in manual cert mode (Thanks for debugging, Roland!) Fixes #4082 Change-Id: I400a64001c3c58899bb570b759b08e745abc0be1 Signed-off-by: Brad Fitzpatrick <bradfi... — committed to tailscale/tailscale by bradfitz 2 years ago
- cmd/derper: fix data race & server panic in manual cert mode (Thanks for debugging, Roland!) Fixes #4082 Change-Id: I400a64001c3c58899bb570b759b08e745abc0be1 Signed-off-by: Brad Fitzpatrick <bradfi... — committed to tailscale/tailscale by bradfitz 2 years ago
- cmd/derper: fix data race & server panic in manual cert mode (Thanks for debugging, Roland!) Fixes #4082 Change-Id: I400a64001c3c58899bb570b759b08e745abc0be1 Signed-off-by: Brad Fitzpatrick <bradfi... — committed to tailscale/tailscale by bradfitz 2 years ago
I’ve submitted a fix. Please rebuild your
derperbinaries and try again.not with non-standard ports.
I just reverse proxied the http probe server, not the stun server. so it’s as simple as
The Go security team points out that the cmd/derper code’s manual cert mode is at fault. It’s returning the same
*tls.Certificatevalue on each call, but the GetCertificate wrapper around it appends to it on each call:https://github.com/tailscale/tailscale/blob/9996d94b3c281e537bc3ef51a694c43ebee79c2c/cmd/derper/derper.go#L234-L241
Not only is that a data race, but eventually the cert gets so big that it can’t marshal anymore.
The normal LetsEncrypt mode doesn’t have this problem its GetCertificate call (the one we wrap) always returns a new *tls.Certificate value.
So the fix for that panic is fixing the
manualCertManager’sTLSConfigmethod to return aGetCertificatethat’s unique each time. Looks like this has been a problem since it was added in d8c5d00ecbaf2352f50ae3f26f795621a6e7972f. @SilverBut, interested in fixing?貌似是国内vps没备案的原因,会进行tls阻断
“It seems that the reason is that the domestic VPS has not been filed, and TLS blocking will be performed.”