-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore: Optimize local cache sync when resuming a query that had docs deleted #4982
Conversation
…e bloom filter support has now been deployed to production. (#4871)
📝 PRs merging into main branchOur main branch should always be in a releasable state. If you are working on a larger change, or if you don't want this change to see the light of the day just yet, consider using a feature branch first, and only merge into the main branch when the code complete and ready to be released. |
Coverage Report 1Affected Products
Test Logs |
Unit Test Results 162 files + 88 162 suites +88 1m 57s ⏱️ - 3m 10s Results for commit d9a59e2. ± Comparison against base commit 41890a0. This pull request removes 240 and adds 1158 tests. Note that renamed tests count towards both.
♻️ This comment has been updated with latest results. |
Size Report 1Affected Products
Test Logs |
Startup Time Report 1Note: Layout is sometimes suboptimal due to limited formatting support on GitHub. Please check this report on GCS. Notes
Startup Times
|
firebase-firestore/src/main/java/com/google/firebase/firestore/remote/BloomFilter.java
Outdated
Show resolved
Hide resolved
…feature has been merged into the android sdk in firebase/firebase-android-sdk#4982
…feature has been merged into the android sdk in firebase/firebase-android-sdk#4982 (#7285)
For a discussion about the implementation details of this PR, see firebase/firebase-ios-sdk#12270. |
Ported from firebase/firebase-js-sdk#7229
Implement an optimization in Firestore when resuming a query where documents have either been deleted or no longer match the query on the server (a.k.a. "removed"). The optimization avoids re-running the entire query just to figure out which documents were deleted/removed in most cases.
Background Information
When a Firestore query is sent to the server, the server replies with the documents in the result set and a "resume token". The result set and the resume token are stored locally on the client. If the same query is resumed at a later time, such as by a later call to
Query.get()
or a listener registered viaQuery.addSnapshotListener()
reconnects, then the client sends the same query to the server, but this time includes the resume token. To save on network bandwidth, the server only replies with the documents that have changed since the timestamp encoded in the resume token. Additionally, if the query is resumed within 30 minutes, and persistence is enabled, then the customer is only billed for the delta, and not the entire result set (see https://firebase.google.com/docs/firestore/pricing#listens for the official and most up-to-date details on pricing).The problem is that if some documents in the result set were deleted or removed (i.e. changed to no longer match the query) then the server simply does not observe their presence in the result set and does not send updates for them. This leaves the client's cache in an inconsistent state because it still contains the deleted/removed documents. To work around this cache inconsistency, the server also replies with an "existence filter", a count of the documents that matched the query on the server. The client then compares this count with the number of documents that match the query in its local cache. If those counts are the same then all is good and the result set is raised via a snapshot; however, if the counts do not match then this is called an "existence filter mismatch" and the client re-runs the entire query from scratch, without a resume token, to figure out which documents in its local cache were deleted or removed. Then, the deleted or removed documents go into "limbo" and individual document reads are issued for each of the limbo documents to bring them into sync with the server.
The inefficiency is realized when the client "re-runs the entire query from scratch". This is inefficient for 2 reasons: (1) it re-transmits documents that were just sent when the query was resumed, wasting network bandwidth and (2) it results in being billed for document reads of the entire result set.
The Optimization
To avoid this expensive re-running of the query from scratch the server has been modified to also reply with the names of the documents that had not changed since the timestamp encoded in the resume token. With this additional information, the client can determine which documents in its local cache were deleted or removed, and directly put them into "limbo" without having to re-run the entire query from scratch.
The document names sent from the server are encoded in a data structure called a "bloom filter". A bloom filter is a size-efficient way to encode a "set" of strings. The size efficiency comes at the cost of correctness; that is, when testing for membership in a bloom filter it may incorrectly report that a value is contained in the bloom filter when in fact it is not (a.k.a. a "false positive"). The probability of this happening is made to be exceptionally low by tweaking the parameters of the bloom filter. However, when a false positive does happen then the client is forced to fall back to a full requery. But eliminating the vast majority of the full requeries is an overall win.
Googlers see go/firestore-ttl-deletion-protocol-changes for full details.