swift-openapi-generator: How to deal with TooManyIterationsError since Swift OpenAPI URLSession 0.3.1?

Hello,

The library Swift OpenAPI URLSession, since its version 0.3.1, contains the commit 686df72 which has the URLSession.bidirectionalStreamingRequest() method return a HTTPBody created with the .single iteration behavior.

This behavior is still present in 1.0.0-alpha.1.

Since my application has upgraded to Swift OpenAPI URLSession 0.3.1, this creates an issue with a ClientMiddleware of mine, which consumes the body in order to log it.

The reason why I log the body is because it is very useful for debugging the server, the openapi.yml, and generally the decoding of success and error responses. Sometimes this log is the only way to have strict verbatim of what’s happening, especially for requests that can’t be replaid.

So this logging middleware has started failing:

struct MyLogger: ClientMiddleware {
    public func intercept(
        _ request: HTTPRequest,
        body requestBody: HTTPBody?,
        baseURL: URL,
        operationID: String,
        next: (HTTPRequest, HTTPBody?, URL) async throws -> (HTTPResponse, HTTPBody?)
    ) async throws -> (HTTPResponse, HTTPBody?)
    {
        let (response, responseBody) = try await next(request, requestBody, baseURL)
        
        // NEW in v0.3.1: ❌ OpenAPIRuntime.HTTPBody attempted to create a
        // second iterator, but the underlying sequence is only safe to be
        // iterated once.
        if let responseBody,
            let bodyData = try? await Data(collecting: responseBody, upTo: 1024 * 1024)
        {
            // log the data with other relevant information
        }
        
        return (response, responseBody)
    }
}

What are my options, if I want to restore body logging?

About this issue

Original URL
State: closed
Created 7 months ago
Comments: 15

Most upvoted comments

Hurry up, because 1.0.0 is almost there 😅

It’s unfortunate that it’s too easy to accidentally buffer the body. Sure you can switch over the iteration kind, but you’re not forced to.

I won’t oppose this point of view, but I may try to explain mine.

When I first wrote my logger middleware, the response body was not async yet (v0.2.0). I was surprised that a middleware could not process the HTTP response until the whole data has been downloaded, long after the status code and headers were received, but I wasn’t worried:

let response = try await next(request, baseURL)
let bodyData = response.body // Data

Indeed, when I upgraded to v0.3.0, the response body turned async as expected. This time I was surprised that this stream could be awaited twice (once by my middleware, and once by the generated OpenAPI code). Anyway, this worked, so I didn’t bother more than a few seconds:

let (response, responseBody) = try await next(request, requestBody, baseURL)
if let responseBody {
    let bodyData = try await Data(collecting: responseBody, upTo: ...)
}

At last, with v0.3.1, I got the TooManyIterationsError “OpenAPIRuntime.HTTPBody attempted to create a second iterator”.

I first struggled to fix it, because I hadn’t learnt about HTTPBody constructors yet. Hence this issue.

But I want to stress out that it’s only with this error that everything made sense at last. All my initial surprises were fixed.

This is why I’m not sure it is a good idea to make HTTPBody an enum that screams its iterationBehavior with its cases (if I understood well the suggestion). It looks like an implementation detail to me. Efficient HTTP handling requires single-pass sequences, so all middlewares in the chain should be ready to deal with them. OpenAPI generator should help them doing it right.

Even if HTTPBody was turned in to an enum, no case would prevent me from buffering the response body. I suppose it would not become forbidden to log responses from a single-pass sequence, right?

So the initial problem would still be there, which is that nothing prevents a middleware from returning a response body that was consumed:

public func intercept(
    _ request: HTTPRequest,
    body requestBody: HTTPBody?,
    baseURL: URL,
    operationID: String,
    next: (HTTPRequest, HTTPBody?, URL) async throws -> (HTTPResponse, HTTPBody?))
async throws -> (HTTPResponse, HTTPBody?)
{
    let (response, responseBody) = try await next(request, requestBody, baseURL)
    // <- Here consume `responseBody` by buffering the response
    return (response, responseBody) // <- 😱 Be ready for TooManyIterationsError
}

That’s why a HTTPResponse: ~Copyable, as suggested by @czechboy0, may well be a better solution. It would make it impossible to return the same responseBody after consuming it:

public func intercept(
    _ request: HTTPRequest,
    body requestBody: HTTPBody?,
    baseURL: URL,
    operationID: String,
    next: (HTTPRequest, HTTPBody?, URL) async throws -> (HTTPResponse, HTTPBody?))
async throws -> (HTTPResponse, HTTPBody?)
{
    let (response, responseBody) = try await next(request, requestBody, baseURL)
    // <- Here consume `responseBody` by buffering the response
    return (response, responseBody) // <- ❌ compiler error
}

I’m not expert of non-copyable types though, so I’m unable yet to foresee how difficult it would become to implement StreamingBodyLoggingMiddleware, or a semi-buffered variant of it that yields valid UTF-8 chunks, on top of this new picky HTTPResponse. I mean that preventing invalid code is good, but maybe not so good if it makes it very difficult to deal with expectable valid use cases.

groue on Dec 5, 2023

Exactly, those are the options.

Thanks for confirming! 🙂

Yet this is not correct sample code, as long as StreamingBodyLoggingMiddleware does not provide the buffering that can guarantee valid UTF8 chunks.

Correct, this would be a problem if you decoded the data as UTF-8 Strings and then tried to concatenate them, you might end up with broken text. But if you’re just logging them as UTF-8 using “best effort”, then you might get a logged partial unicode character, you’re right. It really depends on the use case whether that’s a problem or not. The important thing is that the chunk comes into the user-provided closure unmodified, so it’s ultimately up to them how they log the chunk.

I agree with you that everything starts from robust low-level components anyway, regardless of eventual conveniences that are added later.

groue on Dec 4, 2023

Another question (not specific to StreamingBodyLoggingMiddleware): is there any “request identifier” that is stable throughout the whole processing of a given request, that would allow other logging middlewares (e.g. one that logs curl commands built from HTTPRequest) to log some common context?

In the logging sample below, this common context is #1 and #2:
▶️ getStuff #1: $ curl ... // RequestLoggingMiddleware
▶️ postFoo #2: $ curl ...  // RequestLoggingMiddleware
✅ getStuff #1: 200.       // ResponseCodeLoggingMiddleware
⤵️ #1 { "stuffId": ... }   // StreamingBodyLoggingMiddleware
✅ postFoo #2: 201         // ResponseCodeLoggingMiddleware
...
I just noticed the operationID argument to ClientMiddleware.intercept, but it “only” contains the OpenAPI operation id (the identifier of the server endpoint).

You could create a RequestIdInjectingMiddleware that adds a header to the request, and other middlewares can look for it there. But otherwise no, the runtime library code doesn’t insert any unique identifier through other channels.

czechboy0 on Dec 4, 2023

For logging chunks, and preventing buffering, we’re also considering this: https://github.com/apple/swift-openapi-runtime/pull/87

czechboy0 on Dec 4, 2023

I’m sorry, this was a wrong alert.

For whoever has the same issue, make sure you return a new HTTPBody instance built from the collected data. This one can be iterated as many times as needed:

struct MyLogger: ClientMiddleware {
    public func intercept(
        _ request: HTTPRequest,
        body requestBody: HTTPBody?,
        baseURL: URL,
        operationID: String,
        next: (HTTPRequest, HTTPBody?, URL) async throws -> (HTTPResponse, HTTPBody?)
    ) async throws -> (HTTPResponse, HTTPBody?)
    {
        let (response, responseBody) = try await next(request, requestBody, baseURL)
        
        // Collect the full body because we log it. Beware, though,
        // because the HTTPBody documentation says:
        //
        // > Important: You must provide the maximum number of bytes you can buffer in
        // > memory. If more bytes are available, the method throws the
        // > `TooManyBytesError` to stop the process running out of memory.
        // > While discouraged, you can provide `upTo: .max` to read all the
        // > available bytes, without a limit.
        //
        // So make sure this middleware is not used in production.
        let bodyData: Data?
        if let responseBody {
            bodyData = try await Data(collecting: responseBody, upTo: .max)
        } else {
            bodyData = nil
        }
        
        if let bodyData {
            // log the data with other relevant information
        }
        
        // Return a new HTTPBody from the collected data.
        return (response, bodyData.map { HTTPBody($0) })
    }
}

groue on Dec 4, 2023