Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose API to clear offline persistence (in IndexedDB). #449

Closed
merlinnot opened this issue Jan 18, 2018 · 56 comments
Closed

Expose API to clear offline persistence (in IndexedDB). #449

merlinnot opened this issue Jan 18, 2018 · 56 comments

Comments

@merlinnot
Copy link
Contributor

merlinnot commented Jan 18, 2018

Please expose a way to clear offline persistence (in IndexedDB). It would be useful to clean up when the user signs out.

@mikelehen
Copy link
Contributor

There is not currently (other than poking into IndexedDB directly to delete the data).

@merlinnot
Copy link
Contributor Author

Can we make it a feature request then?

@firebase firebase deleted a comment from google-oss-bot Jan 25, 2018
@firebase firebase deleted a comment from google-oss-bot Jan 25, 2018
@mikelehen mikelehen changed the title Firestore data persistence: cleaning up Jan 25, 2018
@mikelehen
Copy link
Contributor

I edited it into a feature request. Feel free to add more detail if you have more specific use cases in mind, etc.

@scottlu
Copy link

scottlu commented Mar 2, 2018

I also need this. When a user logs out we need to clear local caches of private data in our application.

@wilhuff
Copy link
Contributor

wilhuff commented Mar 2, 2018

For any given instance of Firestore we have two kinds of data:

  • pending writes, which are specific to a user
  • cached documents, which are not

We could potentially clear either or both of these.

If user makes a change but signs out before the client has finished sending it currently they'd end up with a persistent pending write, which we'd resume when that specific user signed in again.

Are you intending that this kind of data would be cleared as well?

@scottlu
Copy link

scottlu commented Mar 2, 2018

At logoff time we'd want to know that there are pending writes.

If online we'd wait for them to complete before logging off, then we'd delete cached documents.

If offline we'd warn the user and if the user chooses to continue with logging off, we'd delete the pending writes and the cached documents.

@merlinnot
Copy link
Contributor Author

Same thing in my app, I'd just as a user if he wants to discard changes or not. Probably the best solution would be to expose all of these things separately:

  • check if there are pending writes (+events / promise interface would be nice, like flushWrites: () => Promise<void>)
  • clear persisted docs
  • clear persisted writes

Last two can be of course merged into one method with some options.

@wilhuff
Copy link
Contributor

wilhuff commented Mar 2, 2018

Makes sense. We've discussed adding some way to detect if there are pending writes or wait to flush pending writes in order to make disableNetwork safer for apps that want to explicitly control sync.

Separately we're working on multi-tab support for offline persistence such that active tabs can see each other's pending writes, even while disconnected. We'll need to investigate how this intersects with that work. If signing out in one tab doesn't sign out the others then nuking and paving IndexedDb out from under the other tabs isn't going to end well.

@alejojimenezmp
Copy link

I think that this could also lead to other issues.

Let's say that we have an app with offline persistence. User A logs in, and fetches her/his profile from the database. We have security rules: match /profiles/{uid} { allow read: if request.auth.uid == uid }.
The profile will be fetched and stored locally for offline access.

The problem is that if User A logs out, User B will have access to all local data of that user.
For example, if User B logs in and tries to get all profiles, she/he will get a success response with the local data, including the profile of User A, even if she/he doesn't have permissions to do so.

Whenever there's local data from another user, security rules wont work because the first response always comes from all cached data. This could lead to data breaches or wrong results while being offline.

@wilhuff
Copy link
Contributor

wilhuff commented Mar 13, 2018

I repeat this advice in every discussion related to this but I want to be clear: if you’re at all sensitive to the disclosure of cached information within the same operating system user account: do not enable persistence.

The Firestore client that exists today assumes that operating system user == actual user. Protecting within that boundary is a hard problem that we have not attempted to solve.

We don’t believe clearing IndexedDB on sign-out is sufficient to implement this: it’s trivially defeated by a user forgetting to do it, the browser crashing, or an attacker installing a browser extension to read the contents of the database while another user is signed in.

Clearing IndexedDB is still useful, especially for testing, and we are hoping to add this feature (PRs welcome, of course).

However for sensitive applications like you describe we’re implementing an LRU aspect to our in-memory cache. This will improve perceived performance to be on par with having persistence enabled without requiring the user to remember to sign out or any of the other risks of disclosure.

@alejojimenezmp
Copy link

@wilhuff I agree with you in terms of sensitive data. However, I see this issue has other implications.
For example, security rules are not being taken into account when Firestore returns the first callback from cached data, so if two users use the same device with different accounts, the cached response will return information from the other user, so you could end up showing wrong information in offline mode.

In my opinion, it's a good idea to make cache data specific to each user. Clearing IndexedDB seems to be a simple solution to fix all the issues mentioned here as well.

@wilhuff
Copy link
Contributor

wilhuff commented Mar 13, 2018

If two users use the same device with different operating system accounts, everything works as intended: separate browser caches and operating system permissions on them isolate the users. In this scenario clearing LevelDB doesn't help (or hurt).

If two users use the same device using the same operating system account/browser then separating the cache data within LevelDB doesn't actually protect you: it's trivial to get at the other user's data within the same IndexedDB. If you're willing to work around this by clearing the IndexedDB data in between sessions, just don't write it: disable persistence.

@mikelehen
Copy link
Contributor

Just FYI-

so if two users use the same device with different accounts, the cached response will return information from the other user, so you could end up showing wrong information in offline mode.

This won't actually be the case for most apps. Since security rules do not act as filters (they either allow or deny a query in its entirety), as long as your app doesn't perform broken queries (that will be rejected by the backend due to security rules), users won't see any locally-cached data that they don't have access to... I repeat Gil's warning that you can't rely on this to protect sensitive data (you should disable persistence instead), but you can expect your app to behave offline just fine even if multiple users have used the app, resulting in various locally-cached data... in fact, sharing the cache improves the offline experience and general performance of the app since there's a greater chance that data requested by the app will already be available locally.

@alejojimenezmp
Copy link

@mikelehen @wilhuff Thanks for your feedback.

@mohshraim
Copy link

+1
we need this feature to clear cached data when user reach the maximum storage data quota
thanks

@Endran
Copy link

Endran commented Jun 23, 2018

We need this feature since we need to be able to explicitly remove any data stored on the device by our PWA. Our data is allowed to survive several logins by several users, but at some point we need to clear everything (for all users) if any user explicitly requests this.

Our devices (usually laptops) are used at trusted locations by trusted users, but day by day the trusted locations and trusted users might change.

@nuvoPoint
Copy link

We have been running on Firestore for about one year now and are seeing massive performance issues related to the IndexedDB storing large cached data sets and therefor urgently need this feature.

@mohshraim
Copy link

@wilhuff

Just keep reading this many times..

Garbage collection of older documents, automatically limiting the number of documents in the cache

This look dangerous to us!
Does that mean you are implementing a GC will automatically clear old documents in cache??
If your answer is YES, what is the conditions to clear that docs?

According to our usage for SDK, we heavily use cached data and most of time load only updated docs according to FirestoreServerStampTime.. so clearing old docs without notification will make problem to us...

thanks

@larssn
Copy link

larssn commented Sep 5, 2018

I'm 99.9% sure they wouldn't clear documents where hasPendingWrites === true.

@mikelehen
Copy link
Contributor

Correct. We wouldn't clear documents with unsynced changes. The idea is that we'd clear documents that you previously listened to but which you haven't listened to recently. And we wouldn't clear documents unless your total cache size had exceeded some semi-large threshold (e.g. 10MB of document data, though this is TBD).

@mohshraim
Copy link

@mikelehen

(((Please))) take in your consideration to make this (auto clear method) optional; when we define the firestore instance.

Our case;
-we load around 1K docs of Items from cache
-then get the maxStampTime of cached docs
-then create a listener on stampTime and combine it with local cached data..
-> our method work perfects and we have solve all related problem of this method...
-> our clients using it on -full offline- environment, and heave a clear interface show them un-synced changes using hasPendingWrites === true

-making this (auto clear method) without option will threat us to drop our project on Firestore.

@mikelehen
(((Please))) confirm that (auto clear method) option will be optional

What really we need is to have a new feature to clear a specific collection on cache..
to enable developer clear big data on specific collection like log collection.
something like following;

  • firebase.firestore().collection('collectionName').clearCache(); //to clear collection db

THANKS

@mikelehen
Copy link
Contributor

@mohshraim When you say "get the maxStampTime of cached docs" and "combine it with local cached data", how are you getting the local cached data? If you're doing a get() or onSnapshot() or whatever, then that will bump the "last used" time of the data, and so the SDK will try not to clear it.

How big is your cache? 1k items is probably plenty small so you wouldn't be affected by garbage collection at all.

As for whether it's optional, we're still looking into what controls to expose. cc/ @gsoltis

@mohshraim
Copy link

@mikelehen

Sorry for long reply!

I use get({source:'cache'})
And i have many other collections with much more documents like patientVisit and stockHistory that could have much more documents that i can't expect.
According to what you said

then that will bump the "last used" time of the data, and so the SDK will try not to clear it.

Thanks for clearing this. This will be more safe on main definitions like items and patients, but not on the other collections like (patientVisit) that we call it when user want to view patient history.

We have developed huge library Above the Amazing Firestore SDK. we test it with full offline environment for several hours with heavy operations and its working smoothly...
And according to this we Assure to our clients a Full Offline support environment.
We know its a impossible case to be offline for several days but we support it now.
We didn't reach 10MB of data, until now! but will test that on performance, its 4MB now and working in a good performance. And i am sure that will reach 20-50MB on real environment

Firestore is the best Cloud Service with FullOffline 🥇 🥇 🥇 (Currently); so adding this feature without optional control will remove Firestore from what many developers looking for, like us. 😢

Note we was working on PouchDB before decide moving to Firestore. and we reach 40MB of cached data with good performance also, but not that good features as Firebase.

Please make it optional! 🙏

and maybe export the clearCache function on client SDK so we call when we need to.

Thanks

@larssn
Copy link

larssn commented Sep 7, 2018

So we have clients who are heavily affected by this issue. We cleared indexeddb yesterday at a client, where the size was 25MB. The app was starting to be extremely sluggish (we're talking 10-15 seconds between clicks), but became responsive again after.

We need information on the structure inside indexeddb, because we HAVE to start clearing out data manually. We basically only need to clear one collection (order collection) where pending writes is false.

@larssn
Copy link

larssn commented Sep 7, 2018

Here's an idea in the mean time: How about supporting a persistence mode where the entity being saved is only persisted while hasPendingWrites === true. Once the entity has been synchronized, it would be removed from IndexedDB. This would be all we need, because we're caching all needed entities in memory already because IndexedDB performance is unreliable, and we don't want to wait for the server (even though it can be much faster than indexeddb).

@larssn
Copy link

larssn commented Sep 7, 2018

This will likely cause size problems:

/**
 * Represents the known absence of a document at a particular version.
 * Stored in IndexedDb as part of a DbRemoteDocument object.
 */
export class DbNoDocument {
  constructor(public path: string[], public readTime: DbTimestamp) {}
}

It seems that documents that have been deleted to stay in indexeddb, but flagged as noDocument. This takes up space, and will probably help in slowing everything down.

Can anyone shed light on why this logic is required? Why reference documents that doesn't exist anymore?

@larssn
Copy link

larssn commented Sep 7, 2018

Alright, so I ended up with this.

clearCache(subcollection: string) {
    console.log(`Clearing ${subcollection} and deleted entities`);
    const o = indexedDB.open('firestore/[DEFAULT]/tillty-1/main');
    o.onsuccess = () => {
      const readTrx: IDBTransaction = o.result.transaction('targets', 'readonly');
      const targets = readTrx.objectStore('targets');
      const targetIds: number[] = [];

      targets.openCursor().onsuccess = (e1: any) => {
        const cursor1: IDBCursorWithValue = e1.target.result;
        // Find all target IDs of the entity name we wish to purge
        if (cursor1) {
          if (cursor1.value.canonicalId.indexOf(`/${subcollection}`) !== -1) {
            targetIds.push(cursor1.value.targetId);
          }

          cursor1.continue();
        } else { // No more rows in the 'targets' datastore
          const writeTrx: IDBTransaction = o.result.transaction(['remoteDocuments', 'targetDocuments'], 'readwrite');
          const targetDocuments = writeTrx.objectStore('targetDocuments');
          const remoteDocuments = writeTrx.objectStore('remoteDocuments');

          remoteDocuments.openCursor().onsuccess = (e2: any) => {
            const cursor: IDBCursorWithValue = e2.target.result;
            if (cursor) {
              if ((Array.isArray(cursor.key) && cursor.key.length >= 4 && cursor.key[2] === subcollection) || cursor.value.noDocument) {
                cursor.delete();
              }
              cursor.continue();
            }
          };

          targetDocuments.openCursor().onsuccess = (e2: any) => {
            const cursor2: IDBCursorWithValue = e2.target.result;
            if (cursor2) {
              if (Array.isArray(cursor2.key) && targetIds.find(id => id === <number>cursor2.key[0])) {
                cursor2.delete();
              }
              cursor2.continue();
            }
          };
        }
      };
    };
  }

It should be said that this is built to clear subcollections by name, and not root collections.
If this is not desired change if ((Array.isArray(cursor.key) && cursor.key.length >= 4 && cursor.key[2] === subcollection) to the desired level.

Example:
clearCache('orders'); would clear:

/businesses/bus1/orders
/businesses/bus2/orders
...
/businesses/busN/orders

Note that the function does NOT clear queued writes (they are stored in a different data-store called 'mutations'). So even if the above cleared all orders locally, any changes made to orders would not get lost.

It's been tested in various offline/online scenarios, but I welcome more tests, and general feedback in case I messed something up.

WARNING
@mikelehen has warned against using this method of deleting from indexedDB below (#449 (comment)).

@mohshraim
Copy link

Hey @larssn
Thanks for sharing your clearCache method; test it twice and seems working.. - first test-

I hope that @mikelehen have time to audit your method code to feed us if it will make any hidden bug.

And if firestore team can enhance it and adopt it in SDK -that's will be great-

Thanks

@mikelehen
Copy link
Contributor

I'm afraid I really can't endorse this code or code like it.

The original version that cleared entries from targets was pretty close to correct I think. But the updated version that doesn't clear the targets entries is going to cause trouble. In particular, the targets store contains a resumeToken for each target which the SDK uses to tell the backend that its cache is up-to-date as of that resumeToken. If you've deleted the actual remoteDocuments and targetDocuments entries though, then the SDK is going to incorrectly tell the backend its cache is up-to-date, but it actually will not be, so you will get incorrect events and a further corrupted cache.

Additionally, making changes to the persisted cache while the SDK is running is dangerous and may break assumptions in the SDK, leading to undefined behavior.

Finally, it's really not safe to write code that makes any assumptions about the schema used by the SDK. The schema is an internal implementation detail and subject to change from one release to the next, breaking any code you've written. So you would need to re-audit your code every time you upgrade. The chance for bugs leading to cache corruption is high.

The closest thing to an approach I can endorse would be to delete the entire cache, before initializing the SDK. This is guaranteed to be safe, since it's equivalent to just running your app on a fresh device.

I'm sorry we don't have a better supported option at this time, but we're actively working on garbage collection for the cache which will prevent it from growing beyond a specified threshold. And further cache controls are on the roadmap. For now, any direct modification to the SDK's persisted data is done at your own risk and may break at any time. Sorry!

@larssn
Copy link

larssn commented Sep 11, 2018

@mikelehen Thanks for taking the time to write all that. What you're writing makes sense, and you're right: it's a high risk solution; but seeing the performance of IndexedDB, we couldn't sit on our hands any longer.

Due to the fact that we're writing to a live cache (albeit in a transaction), we had to take out the deletion of target elements, as it caused the crash I mentioned.

We actually haven't observed any cache corruption yet. After a cache cleaning, and then reloading the app, your SDK seems to correctly get the missing entities; but thanks for your feedback, I'll keep an eye on it.

Is there any milestone for the garbage collection? It's been talked about for a long time now.

@larssn
Copy link

larssn commented Sep 12, 2018

Here's the version that deletes from targets, that mike talked about:

  clearCache(subcollection: string) {
    console.log(`Clearing ${subcollection} and deleted entities`);
    const o = indexedDB.open('firestore/[DEFAULT]/tillty-1/main');
    o.onsuccess = () => {
      const writeTrx: IDBTransaction = o.result.transaction(['remoteDocuments', 'targetDocuments', 'targets'], 'readwrite');
      const targetIds: number[] = [];
      const targets = writeTrx.objectStore('targets');
      const targetDocuments = writeTrx.objectStore('targetDocuments');
      const remoteDocuments = writeTrx.objectStore('remoteDocuments');

      targets.openCursor().onsuccess = (e1: any) => {
        const cursor1: IDBCursorWithValue = e1.target.result;
        // Find all target IDs of the entity name we wish to purge
        if (cursor1) {
          if (cursor1.value.canonicalId.indexOf(`/${subcollection}`) !== -1) {
            targetIds.push(cursor1.value.targetId);
            cursor1.delete();
          }

          cursor1.continue();
        } else { // No more rows in the 'targets' datastore
          remoteDocuments.openCursor().onsuccess = (e2: any) => {
            const cursor: IDBCursorWithValue = e2.target.result;
            if (cursor) {
              if ((Array.isArray(cursor.key) && cursor.key.length >= 4 && cursor.key[2] === subcollection) || cursor.value.noDocument) {
                cursor.delete();
              }
              cursor.continue();
            }
          };

          targetDocuments.openCursor().onsuccess = (e2: any) => {
            const cursor2: IDBCursorWithValue = e2.target.result;
            if (cursor2) {
              if (Array.isArray(cursor2.key) && targetIds.find(id => id === <number>cursor2.key[0])) {
                cursor2.delete();
              }
              cursor2.continue();
            }
          };
        }
      };
    };
  }

This function should not be run while firestore is initialized since it can cause internal assertion errors that breaks the DB layer completely. So you should either:
Run clearCache('something') and THEN initialize firestore.
OR run clearCache('something') and then do a location.reload() to make sure firestore is reinitialized.

Again, use at own risk.

@mikelehen
Copy link
Contributor

Thanks @larssn!

As for the timeline for garbage collection, I can't communicate any specific timeline, but it's making good progress. The bulk of it has been implemented on iOS (firebase/firebase-ios-sdk#1600) and is currently being ported to web. That said, we still need to define and expose the API that actually enables it which will take some additional time (all new APIs have to go through some extra process before being exposed). So no concrete timeline, but I promise it's coming. :-)

@larssn
Copy link

larssn commented Sep 12, 2018

@mikelehen Thank you. We'll hold our breath for now :)

@asifbd
Copy link

asifbd commented Oct 22, 2018

Just FYI-

so if two users use the same device with different accounts, the cached response will return information from the other user, so you could end up showing wrong information in offline mode.

This won't actually be the case for most apps. Since security rules do not act as filters (they either allow or deny a query in its entirety), as long as your app doesn't perform broken queries (that will be rejected by the backend due to security rules), users won't see any locally-cached data that they don't have access to... I repeat Gil's warning that you can't rely on this to protect sensitive data (you should disable persistence instead), but you can expect your app to behave offline just fine even if multiple users have used the app, resulting in various locally-cached data... in fact, sharing the cache improves the offline experience and general performance of the app since there's a greater chance that data requested by the app will already be available locally.

@mikelehen apologies for my confusion as a noob, but by saying users won't see any locally-cached data that they don't have access to- do you mean access restricted by security rules? Because I am seeing cached data returned despite data being restricted by security rule (and error callback called as well). I tried two scenarios-

Scenario 1:

  • User A logs in
  • listener Set and initial data fetched/cached
  • User A's role changed (by modifying data in a firestore path from backend/separate client) and consequently data is now restricted by security rule
  • Page refresh, causing listeners to be set again, and this time querysnapshot is still returned from cache with restricted data

Scenario 2:

  • User A logs in
  • listener Set and initial data fetched/cached
  • User A logs out, and listeners unsubscribed
  • User B logs in, causing new listeners to be set, and this time querysnapshot is still returned from cache with restricted data

So I am only confused with the quoted part.

    var unsubscribe = firebase.firestore()
        .collection("schools")
        .doc(this.props.userContext.state.schoolID)
        .collection("chatroom/common/chats").orderBy("timestamp")
            .onSnapshot(
                (querySnapshot)=> {
                    console.log("QuerySnapshot", querySnapshot);

                    var newChats = {};                
                    querySnapshot.forEach((doc)=>{
                        newChats[doc.id] = doc.data()
                    });

                    this.setState({chats: newChats});
                },
                error=>{
                    console.log("Failed to set listener on Chats Node. Possible reason: ", error.message);
                }
            );
@mikelehen
Copy link
Contributor

@asifbd Can you share your security rules that should be restricting access? I'm confused why your listener error handler isn't getting called if the user doesn't have access to all of the documents under chatroom/common/chats.

@asifbd
Copy link

asifbd commented Oct 22, 2018

@mikelehen listener error handlers do get called, in both scenarios. But at the same time, cached data is also returned in snapshot.

... I am seeing cached data returned despite data being restricted by security rule (and error callback called as well)....

My security rules involve several custom functions, just pasting the relevant portion-

function allowAnyUser()
  {
  	return request.auth.uid != null;
  }

function adminAndAbove(forID)
  {
  	return getUserAccess(forID, ['admin'])
  }

function memberAndAbove(forID)
  {
  	return getUserAccess(forID, ['member','admin'])
  }
  
  
function getUserAccess(forID, allowedRoles)
{
    return validateAccessWithUserData(get(/databases/$(database)/documents/test_auth_users/$(request.auth.uid)).data, forID, allowedRoles);
}
  
  
function validateAccessWithUserData(userData, forID, allowedRoles)
{
    return userData != null && userData['schoolId'] != null && userData['schoolId'] == forID && userData['user_role'] != null && userData['user_role'] in allowedRoles;
}

// --------------------------- end of section: custom functions ----------------------------------

match /schools/{schoolId} {
       
    match /chatroom/common/{anything_under_it=**} {
        allow read: if memberAndAbove(schoolId);
        allow write: if memberAndAbove(schoolId);
    }
}
@mikelehen
Copy link
Contributor

@asifbd Ahh, okay. Then that is behaving as expected. Security rules don't act as filters, so in order to prevent your listener from being canceled (and the error callback called), you'll need to adjust the query you're listening to so that it passes your security rules. Once you do that, the results of the query (from cache) will presumably no longer include data that the user doesn't have access to.

Note that I'm not saying that this means your app is more secure. A malicious user could drop into the JS console and manually do queries to access whatever data is in the cache. But in terms of app correctness (showing the user only their own documents), the queries you issue have to be allowed by your security rules anyway (i.e. the user must have access to perform the query), and so your app should behave correctly and only show users data that they have access to.

@sambegin
Copy link

@mikelehen Any news on this ? We're also starting to see our app slowing down gradually as time goes by... Just deleting the indexedDB shows real improvement on queries performance but this is not a viable solution.

Thanks

@mikelehen
Copy link
Contributor

@sambegin No news on an API to explicitly clear persistence, but we are working on garbage collection, to delete data from IndexedDb that hasn't been used recently. @gsoltis has implemented a lot of the underlying infrastructure for that but there's still a bit more work to do.

@sambegin
Copy link

Thanks @mikelehen . I'm specifically not looking for an API to clear persistence... Anything that can fix the performance problem should be good ! I know you can't say anything about timeline but... Are we days, months, years from getting this garbage collection feature ?

@gsoltis
Copy link

gsoltis commented Nov 20, 2018

@sambegin LRU Garbage collection is coming very soon. Pretty much all of the groundwork is in place, it just needs to be wired together. I would expect maybe one or two more PRs then it will be ready!

@mohshraim
Copy link

GC is now available
https://firebase.google.com/support/release-notes/js#5.7.0
with control to data cached size (the best feature)
start test it

@thebrianchen
Copy link

clearPersistence() was launched in v6.2.0.

@firebase firebase locked and limited conversation to collaborators Oct 23, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.