firebase-ios-sdk: Slow Firestore queries on large collection
[REQUIRED] Step 2: Describe your environment
- Xcode version: Xcode 10
- Firebase SDK version: 5.8.1
- Firebase Component: Auth, Core, Database, Firestore, Messaging, Storage
- Component version: 5.8.1 (0.13.3 for FireStore)
[REQUIRED] Step 3: Describe the problem
I am facing the exact same problems as outlined in this issue (problem #1 and #2). I don’t think a follow-up issue was ever created so here I go: • Querying large collections (3000-5000 docs) is fairly slow (~3-4 seconds) • With persistence enabled, sorting (by timestamp) and limiting (say 300 most recent docs) the query is even slower as the local cache queries the whole collection before sorting and limiting it… defeating the purpose of sorting and limiting it 🤓
The issue is linked to the locally persisted cache as turning persistence off takes care of the problem… That being said the persistence feature is a key reason I am using Firestore and would like to keep it and would like to avoid the operational headache of maintaining two DBs for each collection (one with all docs and a second that’s read-only with a subset of the latest docs).
99% of my users are not yet facing this slow-down as their collections are not big enough but can I expect a fix to be coming in short-term (within next 30-60 days) or should I start building an alternative solution? I believe the fix would be to let the local cache index its documents/ collections (or have it download and use the index created by the server for the collections cached locally).
Steps to reproduce:
Step 1: Create a large collection of 3000-5000 documents Step 2: Attach a snapshot listener to it - with and without sorting and limiting the resulting doc snapshots.
Happy to also share my own profiling but I believe the initial post by KabukiAdam outlines the issue well.
Relevant Code:
let channelRef = UserDataService.instance.FbDB.collection("dmChannels").document(channelId).collection("messages")
.order(by: "timeStamp", descending: true).limit(to: 300)
let messageListener = channelRef.addSnapshotListener(includeMetadataChanges: false) { [weak self](snapshot, error) in
guard self != nil else {return}
if let error = error {
print ("Could not download messages")
print(error)
completion(false)
} else {
// Do something with resulting docs
}
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 18
- Comments: 65 (20 by maintainers)
@AchidFarooq Thanks for the added info! Unfortunately this is probably expected at present. The problem is that the SDK doesn’t implement client-side indexing and so when you perform a query and have lots of documents cached locally for the collection in question, the client has to read each document in the collection to determine if it matches the query. This takes a long time and the time taken is proportional to the number of documents in the cache.
The suggested workaround until we’re able to tackle the client-side indexing feature is to keep the cache size low or turn persistence off entirely (depending on app requirements).
Hi guys, here are some new test results, maybe this can help you guys. I suggest you turn the FireStore logging on, this helps a lot to see what happens in the background with Firestore
The biggest thing we found is that if you do exactly the same query but then with Persistence DISABLED firestore is super fast.
iOS (do this before the init of firestore FirebaseApp.configure()):
FirebaseConfiguration.shared.setLoggerLevel(.max)Test parameters
Results of the test:
I have been reading into every issue available on the internet and sorry to say but the problem still exists untill now. We are using FireStore since oktober 2017 in our iOS app, but sad to say we regret it. One of the key features of firestore has to be the capability to get a large collection of documents in a fair amount of time. Now we even have to disable the persistence, and even then its still slow.
Adding a snapshot listener to a collection with 4000 documents returns data in appr. 9 till 30 seconds.
In the beginning when we started using Firestore for iOS our reassurance was that it was still in Beta. But now this problems still exist. Does anybody have a solution yet?
Here is an example, we are trying to get 3 documents from a 4000 document collection with the same query from a iOS app in Firestore.
****** Call with addSnapShotListener - return time -> 9.354734063148499 s. ****** Call with getDocuments - return time -> 9.92848801612854 s.
Offline: ****** OFFLINE: Call with addSnapShotListener - return time -> 9.441628098487854 s. ****** OFFLINE: Call with getDocuments - return time -> 10.107746958732605 s.
Our internet speed is an average of 340 mbps.
No update, unfortunately.
Have you tried setting
cacheSizeBytesto something small (like a few MB)? If yes and this didn’t help I’d like to know more.We are facing similar issues in our app too. Fetching just 2 or 3 documents from a collection of more than 10,000 documents using a Firestore query is very slow. It has even taken more than 10 second sometimes. @fschaus Did archiving help with the speed?
@mikelehen was the fix you mentioned in https://github.com/firebase/firebase-ios-sdk/issues/1868#issuecomment-552496117 ever released?
We’re using the Flutter SDK and we have LOTS of documents and running into performance issues with the local cache. Local indexing as well as more control over the local store would probably solve a lot of our issues. In addition to the performance issues, we can’t selectively delete items from the store so everything stays in place making things slower and slower. I’d love to be able to call something on the SDK to delete anything that has already been synced.
The problem described by this issue is that very large single collections negatively impact the client’s query performance. This is still true. It’s also something we’re working on addressing which is why the issue is still open.
@nikhilag Sorry, i forgot to write a message here. But no it did not work. We have seen no improvements when changing our region from US to the EU (Frankfurt). The speed issue was the same when we run our test on large collections. We both used addSnapShotListener and getDocuments call. At the moment we get a lot of complaints from our users about the speed. But we have tried everything. So we’re just waiting for the Firestore team to come up with a magic solution or an idea on how to work around this issue.
Thanks @wilhuff ! I’ve been testing this on our staging build and so far it has worked remarkably well!
One issue we’ve noticed however, is that for users whose cache was previously large, setting the ‘cacheSizeBytes’ to something low does not immediately clear the caches and can leads to frustratingly long loading times.
Is there any way we can force the cache to clear? Right now the only we have is to delete the app and reinstall which is obviously not the best way. Can you also confirm that the garbage collection will work even if it takes a few days? Or could there be situations in which the cache never truly clears?
Thanks again for your help on this!
@rob-keepsafe If you’re seeing requests fail to ever finish or return incorrect results (which I think is what you are saying when you refer to “querying to break” and “local cache seems easily broken … and never seems to correct itself”), please open new issues with the details on what you’re seeing so that we can investigate. That doesn’t sound expected.
This github issue is specifically tracking the fact that very large single collections in the cache negatively impact the client’s query performance. But the queries should still always complete with correct results.
The good news is we have made some progress on query performance. We have a change that should be released for Android very soon and later for JavaScript and iOS which should make queries that you’ve executed previously return results significantly faster, even if the cached collection is very large. We’ll update this thread once it’s available and would welcome feedback.
@marcglasberg We have thousands of production apps happily running on Firebase and Cloud Firestore. In general, the best thing to do is just to test your app thoroughly before releasing, perhaps being mindful of any operations you can do in the app that might increase the cache size (e.g. you may not want to expose functionality to download a large collection for offline viewing, since this would slow performance for all queries against it).
There are some concerning architectural issues if we can’t paginate 50 at a time of 1,000 total records of something (photos, messages, restaurants, <insert some other common use case here>). You don’t really need to be “at scale” because 1,000 isn’t that many of something to be querying through even for small apps 🙁
This seems to go beyond just indexing alone; creating a “large” (> 1k) collection and then deleting them just causes the querying to break and never respond without reinstalling.
Local cache seems easily broken by common CRUD actions, regardless of if the client is online or not, and never seems to correct itself on subsequent syncs with the server. Unfortunately, local cache is not just an object store but rather the frontend of the querying capability, so when this breaks, the entire app grinds to a halt.
I thought on giving you guys maybe an update on things right now in our app. We started implementing much more direct calls to documentId’s to solve the client side indexing at the moment.
Before, if we had an ID for instance, and we wanted to look up the document matching this key we would perform a query with the .whereFieldIs method. This was taking very long. Now we use much more relations inside documents to get the specific document. The only downside of this is that we still have the problem when we need to get multple documents. But with this we enabled the persistence again so thats also a plus side.
On Android, I ended up writing the offline workflow myself because of poor performance of firestore’s offline workflow. Since I know my schema, I can do a much better job of storing the data in sqlite. It took a couple of weeks but it was worth it. Now firestore offline workflow is disabled and my queries are much faster for fetching certain data which I never needed offline anyway. I also have total control over what data I always want available offline instead of relying on the sdk which was caching ALL the data and at times throwing out the useful data from cache. Obviously there are some cons to this approach where I had to write my own workflow for saving data offline but it wasn’t too bad. I think offline support is a hard problem for firebase team to solve in a general way such that it works for everyone. Maybe in future they will introduce support for indexed data and more control over what queries should work offline.
I’ve opened an issue to reduce the cache size: https://github.com/flutter/flutter/issues/35648
@mikelehen You say a solution is to keep the cache size low. However, the allowed minimum is 1Mb. Could you please at least fix this by letting us define the cache to be really small, say, 50kb?
We can’t really turn the cache off, because it then reads all documents in the snapshot and charges for them. Say the cache is off and I have a listener which will get 100 messages per day. I’ll pay 1 read when the first message arrives, 2 reads when the second message arrives (since it will read both messages), 3 reads for the third and so on. After 100 messages I will have paid 1+2+3+4+…+100=5050 reads, instead of just 100 reads if the cache is on.
In this situation I need the cache to hold 100 messages only. But if each message is 500 bytes, then it fits 2000 messages in 1Mb, and the queries will be slow already.
Thanks @heumn, we also experience this. We have send a new build to our users with persistence off and indeed added a cache. The only problem left is how to tell our users that the app is not usable in offline mode even though we sold it with a offline feature to them.
Also I have had some contact with the firebase support team via email. and they suggested to check in which country our current firestore server is located. When we started the project the only possibility was US (we are located in EU, Amsterdam). He suggested to change to the Frankfurt server and test if that makes a difference.
We will test that and check if that makes things better. WIll let you guys know.
I must say that the google response team is very fast in responding and very understanding. Makes things a little bearable. 😉
@bgetsug The latter. The number of documents per collection in the offline cache determines the performance.
Sorry, I was out on vacation.
The Firestore client will perform better when fewer documents are in a collection. It is generally not a problem to add more collections, but making collections “bigger” may decrease query performance. We currently only index on the collection path, and as such, keeping the list of entries for each collection path low increases performance.
It does not matter how deeply nested the collections are.
@jamesdixon I’ll let @schmidt-sebastian have the final word but I think the answer is “neither”. We want to make the cache fast even when it is large, not put the burden of fine-grained cache management on the developer.
@marcglasberg I would just recommend creating a minimum viable performance testing app that runs through your typical user stories and tests where the performance degrades. For us, it unfortunately happened very quickly (a user has >1,000 items that can be simply sorted), even with pagination. This is because all querying/sorting/filtering is done in-memory as pointed out above.
If it works for you now and continues to work at the collection size you’re seeing at scale, excellent 👍
@heumn Same for you, glad it’s working 👍 During our initial performance tests, it didn’t. I’d rather people perform their own tests based upon their unique schema and user stories rather than blindly trust and build upon it for years only to hit issues after they launch. As you can see from this thread, a number of others are "rant"ing as well, and seeing the same 40 to 50 second delays we saw ¯_(ツ)_/¯
The collection wasn’t “huge” either; all the sample apps show a restaurants search app, so you can easily imagine trying to query 1,000 restaurants in a city and hit the same performance issues. We were under the impression it would “just work” based off all those examples, but were disappointed when it didn’t.
In real world scenarios user doesn’t keep his messages application open all the time and doesn’t wait for all 500 messages to come. He open, reply, close app. Snapshot listeners will change. And this second part will kick in very often.
Heck, users even in same session often is going back and forth between screens.