caddy-l4: SNI matching after proxy_protocol handler does not work
Hi, thanks for this awesome project, it makes very cool configurations possible. But I think I found a bug in the way the proxy_protocol handler works. It seems not to be possible to do sni matching after the proxy_protocol handler was executed. I would expect the following config to work, but instead I see this in the logs:
debug layer4.handlers.proxy_protocol received the PROXY header {"remote": "127.0.0.1:37055", "local": "127.0.0.1:10443"}
debug layer4.matchers.tls matched {"remote": "127.0.0.1:37055", "server_name": "localhost"}
error layer4 handling connection {"error": "tls: first record does not look like a TLS handshake"}
debug layer4 connection stats {"remote": "127.0.0.1:37055", "read": 230, "written": 0, "duration": 0.0081304}
caddy.json
{
"admin": {
"disabled": true
},
"logging": {
"logs": {
"default": {"level":"DEBUG", "encoder": {"format":"console"}}
}
},
"apps": {
"tls": {
"certificates": {
"automate": ["localhost"]
},
"automation": {
"policies": [{
"subjects": ["localhost"],
"issuers": [{
"module": "internal"
}]
}]
}
},
"layer4": {
"servers": {
"https": {
"listen": ["0.0.0.0:10443"],
"routes": [
{
"match": [
{"proxy_protocol": {}}
],
"handle": [
{
"handler": "proxy_protocol",
"timeout": "2s",
"allow": ["127.0.0.1/32"]
}
]
},
{
"match": [
{"tls": {"sni": ["localhost"]}}
],
"handle": [
{"handler": "tls"},
{
"handler": "proxy",
"upstreams": [{"dial": ["127.0.0.1:10080"]}]
}
]
}
]
}
}
},
"http": {
"servers": {
"backend": {
"allow_h2c": true,
"listen": ["127.0.0.1:10080"],
"routes": [
{
"handle": [{
"handler": "static_response",
"status_code": "200",
"body": "Hello World\n",
"headers": {
"Content-Type": ["text/plain"]
}
}]
}
]
}
}
}
}
}
You should be able to run this config yourself since it only uses localhost.
For testing I used curl -v --insecure --haproxy-protocol https://localhost:10443
When the match key for the second route is completely removed, so it matches all requests the above curl works and prints Hello World
. But in my real config I can not use a match all route since I have to distribute requests to different servers.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27 (8 by maintainers)
Commits related to this issue
- Fix: cx.buf containing invalid content (#56) * fix: cx.buf not being emptied once we read past it & are not recording (#55) * fix: Wrap not passing bufReader & losing byte statistics This fixes... — committed to mholt/caddy-l4 by ydylla 2 years ago
- connection: Read from the buffer correctly This should finally close #55. — committed to mholt/caddy-l4 by RussellLuo 2 years ago
- connection: Read from the buffer correctly (#59) This should finally close #55. — committed to mholt/caddy-l4 by RussellLuo 2 years ago
- proxyprotocol: Rewrite proxy protocol matcher (#58) * refactor: rewrite proxy matcher Do not use a bufio.Reader to make the fix for #55 more reliable * test: add tests for proxy protocol handle... — committed to mholt/caddy-l4 by ydylla 2 years ago
Good point. After some sleep I also do not think that the cx.buf.Reset in Warp is the correct fix.
Resetting should happen after we read more bytes than were present in the buffer and we are not recording these bytes. Because that means the content of cx.buf is now definitely invalid since there would be a gap in the data otherwise.
It could be fixed like so:
With this fix all my example configs (caddy.json, tls.json, two-matchers.json) work.
But I still find it strange that Wrap does pass cx.buf which may contain content from previous matchers but drops cx.bufReader. This is effectively also a buffer reset but without cleaning up cx.buf since the next Read will never read from cx.buf since cx.bufReader is nil. I think Wrap should also pass all other fields like so:
This would catch the edge case that a handler did not read all content from cx.buf meaning cx.bufReader is not nil and the following matchers or handlers continue to read at the correct position. It also does not reset the byte statistics.
But at this point why not just do
cx.Conn = conn;
in Wrap so no future internal field is forgotten. Apparently this was the way it was done before 5f9948fe76ae19009258465fd66da1390aa5ecce which caused other problems (see issue #18). I do not fully understand why it does not work, but I can verify it does not work. But copying everything like in the above diff works fine.I will update my PR with the new proposed fix.
I have created a PR. I tried to add as much context as possible in case this comes up again 😄
Thanks guys I learned a lot today.
Maybe the even better fix would be calling
cx.buf.Reset()
inConnection.Wrap
Since the following config without proxy protocol and only with multiple matchers after tls termination has the same unclean buffer problem.
tls.json
With the current master and tls.json,
curl -v --insecure https://localhost:10443
produces:With
cx.buf.Reset()
both of my configs work as expected.@mholt Is the tls.json config from above one that should work, or am I misusing something here? The example.com route obviously does not work and only exists to make the use case more clear.
Thanks @RussellLuo your fix works 🥳
I agree my Seek attempt was not ideal because it would require more knowledge form handlers and matchers. With your fix my
cx.buf.Reset()
line is also rather useless because now we never reach it sincecx.buf.Len()
will always be 0 at that point.To verify I used this reproducer script and a second caddy config together with the caddy.json from above (with two caddys it is even closer to my real setup):
reproducer.py
forwarder.json
Running the script on 28dab695a01c44157fc3cb17b3dcc1835b7c2eca without your fix will result in a line like this:
request 13 failed: <urlopen error EOF occurred in violation of protocol (_ssl.c:1125)>
indicating the tls handshake failed. With your fix everything works fine (no output from the script).I think you can open a PR with that 👍
@mholt I think my refactored proxy protocol matcher from #58 is nevertheless an improvement. It is easier to understand because the whole reading one byte at a time to skip the null byte stuff is missing. It is not needed when not using a
bufio.Reader
. I also moved the matcher into the l4proxyprotocol package.I also have another question. During writing tests for a rewritten but now obsolete proxy proto handler, I noticed that it seems impossible to create a caddy context with a Logger attached to it. For example when doing it this way:
a call to Provision that tries to use the logger like the existing proxy proto handler does https://github.com/mholt/caddy-l4/blob/28dab695a01c44157fc3cb17b3dcc1835b7c2eca/modules/l4proxyprotocol/handler.go#L65 will fail because
ctx.cfg.Logging
is nil. Do you know any way around this? It would be sad if the usage of a logger prevents writing tests 😃Thanks for your analysis @RussellLuo, it motivated me too also look into the code.
I think the wrapping of
cx.buf
can not be removed, it is essential for rewind to work. See the comment here: https://github.com/mholt/caddy-l4/blob/c46c253c62e31bec451035cadd22a2349700e2cc/layer4/connection.go#L66 You can also test this by adding another route with a tls matcher like so:two-matchers.json
This config in combination with you fix will cause caddy to hang.
But you were right the buffer is unclean. But it looks like the real culprit is the
cx.Wrap
call in the proxy protocol handler https://github.com/mholt/caddy-l4/blob/c46c253c62e31bec451035cadd22a2349700e2cc/modules/l4proxyprotocol/handler.go#L164 which passescx.buf
directly. Thus, for the next matcherPROX
is still in the buffer, although the proxy protocol header was already consumed. Which also explains why removing the matcher in my original config works, because thencx.buf
is ignored because no other record & rewind happens.A possible fix for this could be a
WrapDiscard
function that clearscx.buf
before wrapping, and using that in the proxy protocol handler.@ydylla Thanks for verifying the fix! I’ll try to open a PR.
I encountered the same problem while writing tests for my custom module
ratelimit
, my workaround was to define a separate functionprovision()
:And then use
provision()
in the test code.@ydylla @RussellLuo Ah, dang, of course it’d be a little trickier than previously thought. 😦 I don’t have much in way of help or insights right now, but I want to say that I really appreciate the thoughtful consideration going into this while I’m busy on other things. Thank you for helping make the project better and for the great contributions that will help a lot of other people!
Kind of! I guess I haven’t named them. If you think of it that way, there might be 2 sets of 2 modes actually:
These two sets are orthogonal; i.e. you can have all the combinations: recording and reading from buffer, or recording and reading directly from stream, or not recording and reading from buffer, or not recording and reading directly from stream.
So yeah, basically, recording happens when we are running a matcher and need to rewind. We don’t know or care where the bytes come from (buffer or stream), but we just need to pretend as if the reading of those bytes “never happened”, i.e. keep the stream in a pristine state for future handlers.
Reading from the buffer happens when the buffer is non-empty. Once empty, we discard the buffer (I think I do that merely as a signal that it is empty, or that we don’t need to read from it – can’t recall if there is any other significance to setting it to
nil
or not.) and then read directly from the stream.Hope that helps. Also have a lot of hope for your fix, Luo!
Thanks for continuing to work on this!
Ideally, I think we’d better put as less constraints as possible on implementers. Basically, according to my understanding, I think there are two consuming modes: matcher and handler. In matcher mode, we must read from
cx.bufReader
first (without discard any byte), and then record anything read from underlying connection intocx.buf
. In handler mode, we must also read fromcx.bufReader
and actually consume these bytes, and then do no recording.Based on the above analysis, I think this might be a possible solution:
I have tested all the example configs ((caddy.json, tls.json, two-matchers.json) and I think it works.
@ydylla Could you please give this a try?
Hi, sadly this is still not really fixed. Locally all the example configs work fine for me (nearly always), but with my real config on an Ubuntu server there is a high rate of connections that still fail.
It’s the
bufio.NewReader
in the proxy protocol matcher which sometimes reads the full proxy protocol header in its internal buffer (by default 4096 bytes). If this happens the reads from the proxy proto handler do not trigger the over read and thuscx.buf
is not reset, causing the following tls matcher to produce an invalidcx.buf
again and the tls handshake to fail.Using buffered readers on top of
cx.buf
makes managing/remembering the current true consumed byte offset difficult. I experimented a bit with implementingio.Seeker
for thelayer4.Connection
but it’s not really usable yet. I did this because the http matcher for example also uses a buffered reader and we can probably not avoid using them. My hope was that matchers & handlers could then useSeek
on the connection, to place the offset at the correct position again after they are done. With the use of a byte counting reader andbufio.Reader.Buffered()
it is possible to calculate the amount of consumed bytes (https://go.dev/play/p/Rxwcwhai4Gl). With that known all handlers or matchers that used a buffered reader could then callSeek
on the connection to go back to the correct position.The quick fix I am currently using is a rewritten proxy proto matcher which does not use a buffered reader at all. Instead, it reads exactly 12 bytes. With that my fix from #56 is always triggered for the particular config I am using.
The rewritten proxy proto matcher could also help with other configs. I will open a PR with it, maybe you want to merge it.
Sorry I think my explanation for the above screenshot was not good, maybe short text form is better 😄
proxy proto matcher Writes
PROX
tocx.buf
proxy proto handler Reads buffer and consumes header
cx.buf
still containsPROX
tls matcher Does not read
cx.buf
sincecx.bufReader
is nil. Writes tls handshake stuff tocx.buf
but it still containsPROX
because Wrap passed the reference. Nowcx.buf
containsPROX
+ tls handshake 💥I don’t think there exists any scenario where this behavior would be correct.
Right - matchers should not consume the buffer, but the handlers should.
Yes all configs in this issue work with that. I also think this is the correct behavior for the tls and proxy_protocol handlers (the only ones that use Wrap right now). The resetting happens after they consumed their bytes of the connection, so they still saw the buffer content.
In other words they unwrapped a layer and after that the matcher buffer should be clean for the next matchers.
Here is a screenshot of the broken buffer that a tls matcher produces which is called after a proxy_protocol matcher and handler were executed, because
cx.buf
was not empty. ThePROX
at the beginning is exactly what the proxy_protocol matcher matched aka wrote into the buffer, followed by the tls syn. Notice the missing remainder of the proxy protocol header, it was correctly consumed. Matching after a consuming handler should start with an emptycx.buf
, not with the remaining content of a previous matcher.After some analysis, I’m afraid there’s something wrong with the current rewinding mechanism.
Analysis
With the original JSON config @ydylla provided, by adding a log in Connection.rewind as below:
I got the following logs after running the
cURL
command:Note that in the first rewinding,
50 52 4f 58
is the string “PROX” in hexadecimal format, which was read (also recorded incx.buf
) and checked by theproxy_protocol
matcher here:https://github.com/mholt/caddy-l4/blob/c46c253c62e31bec451035cadd22a2349700e2cc/modules/l4proxy/matcher.go#L49-L56
As we can see, in the second rewinding — triggered by the
tls
matcher — the prior bytes50 52 4f 58
(i.e. “PROX”) still remained incx.buf
. Obviously,50 52 4f 58 16 03 01 00 c1 01
is exactly the byte sequence that caused the error"tls: first record does not look like a TLS handshake"
.Possible solution
I think the root cause is that in the current implementation:
https://github.com/mholt/caddy-l4/blob/c46c253c62e31bec451035cadd22a2349700e2cc/layer4/connection.go#L139
reading from
cx.bufReader
does not advance thecx.buf.off
.By changing
rewind
as follows (omitting the log):everything works as expected:
It’s weird that this problem will go away if removing either matcher (
proxy_protocol
ortls
). For example, with the following config:then everything works properly:
I still need time to figure out what happened… (maybe in the next few days)