actix-web: keep alive timer causing truncated responses when using rustls
Expected Behavior
Response is not truncated
Current Behavior
Json response is truncated when beyond ~1Mb of json text is returned. I get back a variable number of bytes ranging from ~700Kb - 950Kb. This suggests a race condition of some sort.
Possible Solution
Unknown
Steps to Reproduce (for bugs)
async fn view_mystuff(state: ActixAppState, _req: HttpRequest,
queryinfo: web::Query<MyQuery>, json: web::Json<MyInternalStruct>) -> HttpResponse {
let resp = match ...call internal thing to get datat... {
Err(e) => {
return build_404_error(&format!("Error fetching sample for features given: {}", e));
},
Ok(r) => {
r
}
};
HttpResponse::Ok().json(resp) <--- truncates
}
let srv = HttpServer::new(move || {
App::new()
.wrap(NormalizeSlash)
.wrap(middleware::Logger::default())
.wrap(Cors::default())
.app_data(app_state.clone())
.app_data(web::PayloadConfig::new(1000000 * 250))
.app_data(web::Json::<MyInternalStruct>::configure(|cfg| {
cfg.limit(1000000 * 25).error_handler(json_error_handler)
}))
.service(web::resource("/features/").route(web::put().to(view_mystuff)))
.default_service(
web::route().to(view_upload)
)
})
.bind(address_http).expect("failed to start server on given address:httpport")
.bind_rustls(address_https, config).expect("failed to start server on given address:httpsport").shutdown_timeout(5).run();
Json serialization works, this prints the entire body, but it is not properly sent to the client
let j = serde_json::to_string(&resp).expect("basdf");
info!("{}", j);
HttpResponse::Ok().body(j) <--- truncates
Context
Your Environment
- Rust Version (I.e, output of
rustc -V
): 1.43 - Actix Web Version:
actix-web = {version = “2”, features=[“rustls”]} actix-rt = “1” actix-files = “^0.2” actix-cors = “^0.2”
futures = “^0.3” async-std = “^1.5”
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 24 (23 by maintainers)
I have a reproduction case using openssl and rustls binding on the same HttpServer and yes, it does appear that the keep alive timer is not being extended as data is sent into the socket.
Requests to the openssl port last as long as they need to for the transfer to happen but rustls port uses exactly the keep alive time every time, causing truncation when the response takes longer to receive.
Looking into a fix now.
Edit: Effects seems different (in a good way?) on master branch but still getting “Keep-alive timeout, close connection” in the logs yet no truncation when going over the keep-alive time.
I still maintain there is a serious bug. Simply switching to openssl and this behavior is not observed.
AHAH I figured out the issue! Python requests (and postman) make an http1.1 request, while curl follows the upgrade to http/2. If you force CURL to http1.1 the same truncation is observed:
curl -k -vs --http1.1 --location --request PUT 'https://blah/features/' --header 'Content-Type: application/json' --data-raw '...the json...'
Versus:
curl -k -vs --location --request PUT 'https://blah/features/' --header 'Content-Type: application/json' --data-raw '...the json...'
I will obviously try this as a work around (will get back to you tonight on that). But from my understanding that timeout should not take effect until the request is fully sent and the server is idle, I don’t see why it would truncate a response in the process of sending
I understand you can’t fix the bug if it’s not reproducable, but this is inheritently a very difficult bug for anyone to reproduce as it appears to dissappear according to multiple factors. The code you shared still doesn’t replicate the environment the same, there’s a missing app_data bind which i know can cause problems with how configurations are read (app_datas at the route level do NOT work when that kind of app_data is used for example: https://github.com/actix/actix-web/issues/1469). Additionally, i have two bind mounts, one on http and one on https.
I will try to narrow down a limited repro-case when i have free time but this is very difficult to replicate. It DOES reliably occur on the application i am testing in production according to http1.1 vs http2 thing.
Are you using rusttls features and the versions i have specified? The example given by robjtede does NOT replicate the complexity of the code i shared.
So if it works on master, with the “Keep-alive timeout, close connection” in the logs, is this issue still a release stopper?
Now I have issue with keep alive. Connection is closed in the middle of one of the requests in the 50% of my e2e test suite runs. I will investigate it further. I am pretty sure something is wrong with keep alive logic.
switching to openssl fixed all my issues without messing with keep-alive at all.