quickstart-unity: Firestore failing to SetAsync() or log the call indefinitely after unknown triggering event

[REQUIRED] Please fill in the following fields:

  • Unity editor version: 2019.3.8, 2019.4, 2020.2
  • Firebase Unity SDK version: 6.15.2 and 7.0.2
  • Source you installed the SDK: Unity Package Manager
  • Problematic Firebase Component: Firestore
  • Other Firebase Components in use: Core, Auth, Storage, FCM, Crashlytics, Analytics
  • Additional SDKs you are using: NA
  • Platform you are using the Unity editor on: Mac
  • Platform you are targeting: iOS (various from iPhone 6 Plus to 12 Pro and iPad Pro)
  • Scripting Runtime: IL2CPP

[REQUIRED] Please describe the issue here:

(Please list the full steps to reproduce the issue. Include device logs, Unity logs, and stack traces if available.)

I am doing a simple SetAsync() call:

await docRef.SetAsync( dictionaryObj, SetOptions.MergeAll )
  .ContinueWithOnMainThread(task =>
    {
      Debug.Log(”We’ve arrived!”);
    });

In the following log, the first few SetAsync() calls work as expected, and eventually one fails absolutely silently, meaning:

  • The data is not created/updated in Firestore.
  • The ContinueWithOnMainThread() is never called at all, so there’s no isFaulted message to help debug.
  • Despite Firebase logging set to Firebase.LogLevel.Verbose there are no logs about any interactions in Firebase locally or remotely. As if the call never happened. The code is definitely reaching a Debug.Log() on the line directly preceding the SetAsync() call.

Other things to note:

  • After hitting this error, subsequent SetAsync() or GetSnapShotAsync() calls always fail silently whether targeting the same or a different docRef. Not sure if there’s any way to “reboot” firestore once it’s been initialized.
  • The app doesn’t crash or freeze, or throw any errors / warnings and the rest of the app continues to operate normally.
  • I don’t believe it has to do with the network. This call is always made immediately after an awaited Storage PutFileAsync() and GetDownloadUrlAsync() because the SetAsync() is essentially storing that retrieved URL. Those Storage calls never fail. It’s always the SetAsync the does.
  • This is happening when building to any iOS device. I can’t get it to occur running in the Editor (Mac). Haven’t tried Android.
  • It occurs only after the app has been running a bit. It never happens if the app is launched and the Upload / SetAsync sequence is run right away.

Given that verbose Firebase logging doesn’t show that a Firebase call is being attempted, I’m not sure how to debug where the failure is happening. Super unfortunate 😦

Steps to reproduce:

Have you been able to reproduce this issue with just the Firebase Unity quickstarts (this GitHub project)? Nope

What’s the issue repro rate? (eg 100%, 1/5 etc) It is quite regular… it happens 0% when the app is first launched, but quite consistently if the app has been doing its thing (this is an ARFoundation app that uses URP’s post processing and utilizes the VideoPlayer at times. I’m also using the NAudio.LAME lib to convert recorded audio to MP3 (the file being uploaded to Storage). There’s decent processing happening, certainly.

What happened? How can we make the problem occur? This could be a description, log/console output, etc.

I’m honestly not sure how to create a simplified, reliable repro environment of this. Is there a way to turn on even more granular logging for Firestore to see at least at what point it fails?

Relevant Code:

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 32 (1 by maintainers)

Most upvoted comments

@jonahgoldsaito Yes, it appears that the “fix” will need to be made in the OS. Yes, there is no equivalent to setting webOperationQueue.qualityOfService in the Firestore Pod. And nothing comes to mind as to why the terminate workaround does not work on your tester’s 12 Pro.

I’m going to use our Google-internal process to log a bug against Apple and I’ll update back here once I have new information. This is going to require that I reproduce the issue so this whole process will take some unknown amount of time. And even if Apple acknowledges the bug, it would require an iOS update to fix. So there is no good short-term solution.

The best idea that I have is to disable the use of libdispatch in the Firestore Pod code that you’ve been patching. This specific asynchronous dispatching via dispatch_async_f() was supposed to be a performance optimization to allow parsing data received from the server in parallel. So there will be some performance penalty. Actually… looking at the logs that you sent me all usages of this specific dispatch queue are starting at most 1 asynchronous task; therefore, this “performance optimization” using libdispatch is actually a bit of a performance hit due to the overheard of multithreading. If this is how your real application works then you may not realize any performance hit at all by removing the usage of libdispatch.

To test out this workaround, please patch the following 2 files with those from this gist: https://gist.github.com/dconeybe/e1801caabef562f1eb7ee6937f2b83df

  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.h
  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc

Hopefully, this will unblock you. Note, however, that since the true nature of this bug is not understood, it could creep up in other dispatch queues as well.

I found another bug report of a user experiencing the same issue when they use ARKit in iOS 14.2: https://developer.apple.com/forums/thread/671645.

Wow that’s very interesting. My iPhone has iOS 14.1, so perhaps is not plagued by this issue and would explain why I had no success reproducing. Moreover, my test app does not use ARFoundation or UnityWebRequest.

Something that stood out to me from that Unity bug thread was a speculation that there is an OS bug at the root. As I’ve been pouring over the Firestore code in https://github.com/firebase/firebase-ios-sdk based on the “zzyzx” logs that you provided me, the only explanation that I can come up with is that the call to dispatch_async_f() in ExecutorLibdispatch::Execute() works… until it doesn’t; namely, at some point the task specified to dispatch_async_f() is not called back, causing the call to BackgroundQueue::AwaitAll() to block forever, causing SetAsync() and all future Firestore operations to never complete.

Although that Unity bug report doesn’t mention dispatch_async_f(), it has a similar symptom with the result of a web request never being sent. Perhaps UnityWebRequest is using dispatch_async_f() (or some other related function) under the hood?

In any case, to confirm whether or not there is a bug in dispatch_async_f(), can you patch the files one more time and capture the logs for me? This time I’ve attached UnityIssue918PatchFilesV3.zip that you can just unzip in the “Pods” directory.

Hi @jonahgoldsaito. Thank you for providing those logs. Unfortunately, I am even more perplexed! The logs show that for some reason the task that is enqueued with the executor here in leveldb_remote_document_cache.cc is never executed, resulting in the subsequent call to tasks.AwaitAll() to block indefinitely, awaiting its completion. This causes the AsyncQueue to no longer proceed, causing the behavior you are noting where SetAsync() and GetSnapshotAsync() never complete.

This can only happen in one of two scenarios: (1) The executor’s threads are all busy and/or deadlocked or (2) the executor silently rejects the task and never executes it. I ruled out (1) because the thread dump that you provided in an earlier comment showed no executor threads doing work. So that leaves (2), which could only happen if the executor’s Dispose() method is invoked, which I think would only happen if Firestore was shutting down (but I could be wrong).

If you don’t mind, I’d like to test out scenario 2. To do this, you need to patch the following 4 files from the gist https://gist.github.com/dconeybe/06c88b1a708d61875d1e3a4d847ee1b2, reproduce the issue, and capture the logs. Note that the leveldb_remote_document_cache.cc from this gist is slightly different than the gist that I posted earlier, so please be sure to patch it as well.

  • Pods/FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor_libdispatch.mm
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor_libdispatch.h
  • Pods/FirebaseFirestore/Firestore/core/src/util/executor.h

Thanks for your help and patience with this investigation!

Since I can’t reproduce, I’m going to ask you to do something for me and capture the resulting logs. With those logs it may help me reproduce, which will enable proper investigation on my end.

Steps:

  1. Find the Pods directory for the iOS project that was exported from Unity.
  2. Make a backup of the file FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc under the Pods directory.
  3. Replace the file FirebaseFirestore/Firestore/core/src/local/leveldb_remote_document_cache.cc under the Pods directory with the version here: https://gist.github.com/dconeybe/b3415ee47af4fdb703b4264026b9d2cf
  4. Reproduce the bug where the Task returned from SetAsync() never completes.
  5. Capture all log lines containing the text “zzyzx” and paste them in a gist.
  6. (Optional) Restore the backed up leveldb_remote_document_cache.cc.

The modified leveldb_remote_document_cache.cc just adds a bunch of logging using LOG_WARN, where each log line contains “zzyzx”.

Another idea to “reset” Firestore is to call FirebaseFirestore.TerminateAsync() (https://firebase.google.com/docs/reference/unity/class/firebase/firestore/firebase-firestore#terminateasync) and then get a new FirebaseFirestore instance. It may also block indefinitely but I’d try this before my previous suggestion of disposing the FirebaseApp.

Hi @jonahgoldsaito. I have been unable to reproduce this issue. I tried writing an iOS app that emulates the steps carried out in your description but I never saw the task returned from SetAsync() not completing.

You also mentioned that “Any Firestore call beyond this point appears to always fail in the same way” which makes me wonder if there is some sort of deadlock going on in Firestore that is blocking all future asynchronous operations. Could you capture the stack traces of all threads after reproducing the issue? The stack traces may end up pointing to somewhere in the code that is deadlocked, to give me something to investigate.

To capture the stack trace, do the following:

  1. Cause the situation where SetAsync() is failing.
  2. Attach Xcode.
  3. Pause the debugger.
  4. Run the following lldb command: thread backtrace all
  5. Past the output of that command into a gist and post the link here.

Thanks for the logs. I don’t see anything obvious that jumps out to me. I’m writing an app that attempts to reproduce this issue and I’ll reply back with the outcome of that experiment. If you have any success in writing a minimal app to reproduce then please share it as it will help with debugging immensely.