apollo-kotlin: Retrying a subscription does not renew the id and may cause an error on the server because the id is already used

Version

4.0.0-alpha.2

Summary

I have noticed few crashes of my app when using subscription. It looks like SDK have problem with parsing errors. I am not sure exactly why. I have contacted my backend team but maybe letting you know can resolve that issue faster.

My setup of ApolloClient looks like this

ApolloClient.Builder()
        .networkTransport(
            HttpNetworkTransport.Builder()
                .serverUrl(BuildConfig.BASE_URL)
                .okHttpClient(okHttpClient)
                .build()
        )
        .subscriptionNetworkTransport(
            WebSocketNetworkTransport.Builder()
                .protocol(GraphQLWsProtocol.Factory())
                .serverUrl(BuildConfig.BASE_URL)
                .okHttpClient(okHttpClient)
                .reopenWhen { throwable, attempt ->
                    if (attempt < 13) {
                        delay(2.0.pow(attempt.toDouble()).toLong())
                    } else {
                        delay(5000L)
                    }
                    true
                }
                .build()
        )
        .build()

and subscription like this

 apolloClient.subscription(BoostCompetitionSubscription(roomId))
            .toFlow()
            .mapNotNull { response ->
                val exception = response.exception
                if (exception is ApolloGraphQLException) {
                    Timber.e(BoostCompetitionException(roomId, exception.errors, exception))
                }
                try {
                    response.data?.boostCompetitions?.firstOrNull()?.toDomain()
                } catch (ex: Exception) {
                    Timber.e(BoostCompetitionParsingException(roomId, ex.message, ex))
                    null
                }
            }

Just before the crash reopenWhen set on WebScoketNetworkTransport was fired once with attempt 0

Is there something I could do to prevent app from crashing?

Steps to reproduce the behavior

No response

Logs

com.apollographql.apollo3.exception.JsonDataException: Expected BEGIN_ARRAY but was BEGIN_OBJECT at path errors
    at com.apollographql.apollo3.api.json.MapJsonReader.peek
    at com.apollographql.apollo3.api.json.MapJsonReader.beginArray
    at com.apollographql.apollo3.api.json.MapJsonReader.beginArray
    at com.apollographql.apollo3.api.json.MapJsonReader.beginArray
    at com.apollographql.apollo3.api.json.MapJsonReader.beginArray
    at com.apollographql.apollo3.api.internal.ResponseParser.readErrors
    at com.apollographql.apollo3.api.internal.ResponseParser.readErrors
    at com.apollographql.apollo3.api.internal.ResponseParser.parse
    at com.apollographql.apollo3.api.Operations.parseJsonResponse
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$$inlined$map$1$2.emit
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$$inlined$map$1$2.emit$bridge
    at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke
    at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke
    at kotlinx.coroutines.flow.internal.SafeCollectorKt$emitFun$1.invoke
    at kotlinx.coroutines.flow.internal.SafeCollector.emit
    at kotlinx.coroutines.flow.internal.SafeCollector.emit
    at kotlinx.coroutines.flow.internal.SafeCollector.emit
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$3.invokeSuspend
    at androidx.compose.material.AnchoredDraggableKt$snapTo$2.invokeSuspend$bridge
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$3.invoke
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$3.invoke
    at androidx.compose.material.AnchoredDraggableKt$snapTo$2.invoke$bridge(SourceFile:0)
    at com.apollographql.apollo3.internal.FlowsKt$transformWhile$1$invokeSuspend$$inlined$collectWhile$1.emit
    at com.apollographql.apollo3.internal.FlowsKt$transformWhile$1$invokeSuspend$$inlined$collectWhile$2.emit
    at androidx.compose.ui.platform.WindowRecomposer_androidKt$createLifecycleAwareWindowRecomposer$2$onStateChanged$1$1$1$1.emit$bridge
    at com.apollographql.apollo3.network.ws.WebSocketNetworkTransport$execute$$inlined$filter$1$2.emit
    at androidx.compose.foundation.text.selection.SelectionMagnifierKt$rememberAnimatedMagnifierPosition$1$2.emit$bridge
    at kotlinx.coroutines.flow.SubscribedFlowCollector.emit
    at kotlinx.coroutines.flow.SharedFlowImpl.collect$suspendImpl
    at kotlinx.coroutines.flow.SharedFlowImpl$collect$1.invokeSuspend
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith
    at kotlinx.coroutines.DispatchedTask.run
    at kotlinx.coroutines.DispatchedTask.run
    at kotlinx.coroutines.internal.LimitedDispatcher.run
    at kotlinx.coroutines.scheduling.TaskImpl.run
    at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker
    at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run

About this issue

  • Original URL
  • State: open
  • Created 10 months ago
  • Comments: 16 (8 by maintainers)

Most upvoted comments

Oooh my…, the KDoc for ApolloClient.Builder is severaly lacking. Follow up issue here.

  • ApolloCall.retryOnError(Boolean) uses a boolean that lets individual operation decide if they want to be retried
  • ApolloClient.Builder.retryOnError((ApolloRequest) -> Boolean) uses a function that determines the default value if no retryOnError has been set on the ApolloCall. Usually you’ll do { it.operation is Subscription } to only retry subscriptions (but you might want to retry queries too)

The actual logic is done in RetryOnErrorInterceptor. It leverages coroutines to retry the Flow whenever an error happens. If no error happens then no retry is done.

Hope this helps, since this is all pretty new there’s not a lot of docs but I’ll get working on this (edit: KDoc PR here).

Hey @martinbonnin Thanks for update, I was looking for those changes. No worries about delay. We get this SDK for free so appreciate any feedback and support 😄

I have updated to beta5 now and added retryOnError. Will see how that works. Regarding incubating, I wouldn’t mind to try that as well but would appreciate some hints on it. We use it on prod so don’t want to take too much risk with experimentation🙈

Here is how we setup apollo client currently:

   ApolloClient.Builder()
        .networkTransport(
            HttpNetworkTransport.Builder()
                .serverUrl(applicationInfo.hasuraUrl)
                .httpEngine(KtorHttpEngine(httpClient))
                .build()
        )
        .retryOnError { it.operation is Subscription }
        .subscriptionNetworkTransport(
            WebSocketNetworkTransport.Builder()
                .webSocketEngine(KtorWebSocketEngine(httpClient))
                .protocol(GraphQLWsProtocol.Factory())
                .serverUrl(applicationInfo.hasuraUrl)
                /**
                 * ApolloClient on logout stop subscriptions and wait by default 60 sec
                 * until connection is closed. Below function change that to 5 sec
                 * so another user has no chance to login and re-use previous connection.
                 */
                .idleTimeoutMillis(5000)
                .reopenWhen { throwable, attempt ->
                    log.i(
                        throwable = throwable,
                        messageString = "Reopen when attempt $attempt"
                    )
                    if (throwable is ApolloWebSocketClosedException) {
                        log.w(
                            throwable = throwable,
                            messageString = "Web socket closed"
                        )
                    }
                    /**
                     * Power of 13 attempts give ~5 sec delay.
                     * If it's higher attempt we keep 5 sec delay max.
                     */
                    exponentialDelay(attempt)
                }
                .build()
        )
        .build()

Things that I would need to address:

  1. webSocketEngine cannot take KtorWebSocketEngine. Looks like there is just one implementation of WebSocketEngine inside incubating which is AppleWebSocketEngine
  2. protocol replaced with wsProtocol and there is no Factory for GraphQLWsProtocol but I guess it suppose to be created now with constructor?
  3. There is no reopenWhen which was called on network disconnections. Is it not needed anymore? Is every disconnection now passed to each subscription or is it handled silently by SDK?

Additionally, you can see that I am reducing idle timeout. I discovered that re-login user kept using previous connection. Do you know if there is a better way to invalidate connection?

Thanks!

@damianpetla apologies for the super late response here. This ended up being quite the rabbit hole…

The tlrd; of that rabbit hole is that retrying at the WebSocket like was previously done has major limitations as you found out:

  1. The WebSocket doesn’t know about operation ids and cannot update them when retrying.
  2. In general, subscriptions can happen over other transports than websockets. Multipart HTTP being another example. So retrying at the WebSocket layer does not help there.

All in all, 4.0.0-beta.5 adds a retry option on the ApolloClient itself:

    ApolloClient.Builder()
        .serverUrl(sampleServer.graphqlUrl())
        .retryOnError { it.operation is Subscription }
        .subscriptionNetworkTransport(
            WebSocketNetworkTransport.Builder()
                .serverUrl(sampleServer.subscriptionsUrl())
                .build()
        )
        .build()

This code also uses an incubating WebSocketNetworkTransport that should make it a lot easier to debug things moving forward:

dependencies {
  implementation("com.apollographql.apollo3:apollo-websocket-network-transport-incubating")
}

For more details, you can also check the integration test here

The current plan is to replace the default implementation of WebSocketNetworkTransport in the next version if everything looks good with the incubating one. If you’re still around and have the time to try it, I’d love to hear what you think. Again, deep apologies for the delay on this issue and thanks for raising it 🙏

thanks @martinbonnin can’t do that unfortunately because we use KKM and I have Apollo in shared module exposing Flow and two collectors in android module. I will use single flow, not a big deal. Just wanted to raise this as it might be a potential improvement to Apollo allowing multiple collectors.

do I get it right that this will prevent crash?

Yes, instead of a crash, you will have an ApolloResponse with null data and non-null exception instead

when can I expect next version?

We typically release ~1 a month these days but there is no fixed release schedule. You can already try in the SNAPSHOTs though