caddy: caddyserver-reverseproxy + ffmpeg streaming causes invalid files/connection state

Hello,

first of all, great project, love the caddy server - love what you guys are doing, keep up the good work! 👍

Now for my problem:

I am using the caddy server as a reverse proxy for a python application that receives ffmpeg’s streaming data and processes it (which is out of scope for this issue). Whenever the use-case has a client exiting the connection to caddy validly, we do not have any issues at all.

Now, we are trying some low-latency streaming, which involves that ffmpeg runs with the parameter -http_persistent 0. Now this causes problems with caddy. Using ffmpeg’s http persistent 0 causes the connection to simply drop out without a clean “exit”.

It looks like, when we are using caddy as inbetween proxy, whenever the connection drops out, there’s a 50:50 chance that it causes an issue on the python side (OSErrors).

If we remove caddy inbetween it works fine (streaming direct to the python server). This is not only related to UWSGI/python, we have the same issue if you just use plain old nginx with WebDAV to PUT the files on via ffmpeg.

I have created a PoC project, containing a minimum viable client for this issue: https://github.com/kuhnchris/caddyserver-ffmpeg-ll-hls-poc

In the default configuration (docker compose up -d && cd ffmpeg && sh ll_hls.sh) we see that caddy sometimes works without any hickups, and then, randomly throws this on python (and nginx) side:

poc-caddy-1  | {"level":"debug","ts":1659176760.2804174,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"uwsgi:8002","duration":0.001713273,"request":{"remote_ip":"172.18.0.1","remote_port":"51190","proto":"HTTP/1.1","method":"PUT","host":"localhost:8888","uri":"/example.m3u8","headers":{"User-Agent":["Lavf/58.76.100"],"X-Forwarded-Proto":["http"],"X-Forwarded-Host":["localhost:8888"],"Icy-Metadata":["1"],"Accept":["*/*"],"X-Forwarded-For":["172.18.0.1"],"X-Server-Name":["test"]}},"error":"context canceled"}
poc-caddy-1  | {"level":"debug","ts":1659176760.2806652,"logger":"http.handlers.reverse_proxy","msg":"upstream roundtrip","upstream":"uwsgi:8002","duration":0.00282887,"request":{"remote_ip":"172.18.0.1","remote_port":"51188","proto":"HTTP/1.1","method":"PUT","host":"localhost:8888","uri":"/example190.m4s","headers":{"X-Server-Name":["test"],"Accept":["*/*"],"X-Forwarded-Proto":["http"],"X-Forwarded-Host":["localhost:8888"],"Icy-Metadata":["1"],"X-Forwarded-For":["172.18.0.1"],"User-Agent":["Lavf/58.76.100"]}},"error":"context canceled"}

i.e. “context canceled”. But only sometimes it feels like the reverse_proxy target is not receiving everything, so it feels like this is a race condition.

poc-uwsgi-1  | [pid: 9|app: 0|req: 61/379] 172.18.0.3 () {38 vars in 498 bytes} [Sat Jul 30 10:25:56 2022] PUT /example188.m4s => generated 0 bytes in 1 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 4)
poc-uwsgi-1  | Traceback (most recent call last):
poc-uwsgi-1  |   File "/root/./test_handler.py", line 12, in dispatch
poc-uwsgi-1  |     while _ := uwsgi.chunked_read(30):
poc-uwsgi-1  | OSError: unable to receive chunked part
poc-uwsgi-1  | [pid: 8|app: 0|req: 320/380] 172.18.0.3 () {38 vars in 494 bytes} [Sat Jul 30 10:25:58 2022] PUT /example.m3u8 => generated 0 bytes in 0 msecs (HTTP/1.1 500) 0 headers in 0 bytes (0 switches on core 4)
poc-uwsgi-1  | [pid: 8|app: 0|req: 320/381] 172.18.0.3 () {38 vars in 498 bytes} [Sat Jul 30 10:25:58 2022] PUT /example189.m4s => generated 2 bytes in 1 msecs (HTTP/1.1 200) 2 headers in 64 bytes (1 switches on core 2)
poc-uwsgi-1  | Traceback (most recent call last):
poc-uwsgi-1  |   File "/root/./test_handler.py", line 12, in dispatch
poc-uwsgi-1  |     while _ := uwsgi.chunked_read(30):
poc-uwsgi-1  | OSError: unable to receive chunked part

on the nginx side it looks as follows:

poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:44 +0000] "PUT /example.m3u8 HTTP/1.1" 204 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:44 +0000] "PUT /example392.m4s HTTP/1.1" 201 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:46 +0000] "PUT /example393.m4s HTTP/1.1" 201 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:46 +0000] "PUT /example.m3u8 HTTP/1.1" 204 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:48 +0000] "PUT /example.m3u8 HTTP/1.1" 400 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:48 +0000] "PUT /example394.m4s HTTP/1.1" 400 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:50 +0000] "PUT /example.m3u8 HTTP/1.1" 400 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:32:50 +0000] "PUT /example395.m4s HTTP/1.1" 201 0 "-" "Lavf/58.76.100"
...
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:33:30 +0000] "PUT /example.m3u8 HTTP/1.1" 204 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:33:32 +0000] "PUT /example.m3u8 HTTP/1.1" 204 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:33:32 +0000] "PUT /example416.m4s HTTP/1.1" 201 0 "-" "Lavf/58.76.100"
poc-nginx-1  | 172.18.0.3 - - [30/Jul/2022:10:33:34 +0000] "PUT /example.m3u8 HTTP/1.1" 400 0 "-" "Lavf/58.76.100"

the “400” correspond to the broken issue, the 204s and 201s are OK. This happens pretty randomly. I tried looking around for the aformentioned “context closed” and found 2 spots in the reverse_proxy.go, but I didn’t really see any way to work around this issue.

Does anyone have any idea how we could get this to work without caddy randomly causing exceptions/errors when having the (client) connection closed?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (14 by maintainers)

Commits related to this issue

Most upvoted comments

Hurray! Ok, I think this fix may be a viable solution then (once I clean it up slightly). We replace the request’s context with one that doesn’t get canceled when the client closes the connection just before we pass it to the standard library.

OK, the test that had the issue didn’t have the “flush_interval: -1” in the config, that’s why it looked like pre-fix. With this fix we actually managed to get sub-5s low latency streaming, so it did fix the issue in question and solves it for our usage. 😃

Sorry, i have to wave back, something is fishy, i just tried it again with the mini-uwsgi example and added output to the files, it seems to output everything fine, so something is going on, I am having a co-worker investigate this and get to you guys real quick, sorry about the confusion!

image

I’ll ping this thread again if something comes up.

Great, thank you for the simpler reproducer! I need to find time to work on this as I’m currently deep in the mud on another complex issue. I’ll return to this though, however if someone debugs this sooner that would definitely be faster!