surf: curl client does not receive the full body

It seems like the curl based client does not wait until all the data is received, which can make it fail deserializing when the response is too large.

The code is roughly this, but this is not the actual URL as its a private gitlab instance:

    let response: Vec<Project> = surf::get("http://gitlab.com/api/v4/projects")
        .recv_json()
        .await
        .unwrap();

The response suddenly ends like this:

..."star_count":100,"forks_count":2,"last_activity_at":"2019-07-02T08:41:28.976Z","namespace":{"id":999,"name":"r

Querying the same URL with the curl command line tool shows the whole body.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 3
  • Comments: 18 (7 by maintainers)

Most upvoted comments

The upstream Isahc bug has been fixed, so updating the version Surf uses should fix the problem.

Published v1.0.2 which should fix this! Let us know if it works out alright!

Going to go ahead an close this issue assuming it’s fixed; happy to reopen if more work is needed!

Also haunted by this issue, when will the next version be published? This looks like a critical fix to me.

I am hitting the same issue. Here is a one-liner to reproduce it 😃.

git clone --branch surf-bug https://github.com/Byron/github-star-counter && cd github-star-counter && cargo run  -- --log-level INFO seanmonstar

Looks fixed for me. First time I retrieve a 300kbs file 😃. Thanks!

@yoshuawuyts I think this assessment is most accurate:

Okay, it seems chttp/isahc stops reading bytes because the client is terminated.

Your chttp::Client is being dropped after you receive the response headers, but during the middle of tranferring the response stream. This happens because awaiting a Request consumes self and returns a Response after the response headers are received. Since the Request is dropped, the HttpClient is also dropped.

https://github.com/rustasync/surf/blob/d7d9d054aad0a9bc46a7d177e133770c25451c6a/src/request.rs#L575-L582

Since all we have left on the stack is a http::Response<Body> after executing the request, the HttpClient is no longer alive to finish the transfer of the response body.

There’s some interesting questions here on how the lifetime of an HttpClient should interact with a Body, but either way, Isahc clients should probably keep themselves alive in the background until all active transfers either finish or are cancelled (dropping a chttp::Body is enough to signal cancellation). I opened a bug on Isahc here: https://github.com/sagebind/isahc/issues/64

Okay, re-ran it with hyper again (used #34), and this time around it does work! This is with the default runtime (though I suspect that may not be the problem right now).

Repro code

main.rs

#![feature(async_await)]

#[runtime::main]
async fn main() {
    femme::start(log::LevelFilter::Info).unwrap();
    let string = surf::get("http://localhost:8080")
        .recv_string()
        .await
        .unwrap();
    dbg!(&string);
    println!("received {} bytes", string.len());
}

index.js

var http = require('http')

http.createServer((req, res) => {
  var body = []
  for (let i = 0, j = 100000; i < j; i++) {
    body.push({ message: 'hello world' })
  }
  res.end(JSON.stringify(body))
}).listen(8080, () => console.log('listening on port 8080'))

Screenshot

2019-08-16-160739_1920x1080

Repro

https://github.com/yoshuawuyts/repro-surf-36


cc/ @sagebind, do you have any idea what might be going wrong here?