-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Firestore crash / assertion failed / failed to commit transaction / collect garbage #2846
Comments
I couldn't figure out how to label this issue, so I've labeled it for a human to triage. Hang tight. |
Possibly worth noting in case it is relevant -
|
Some background: The Firestore client uses LevelDB under the covers to implement local storage. The Firestore client implements a least-recently-used garbage collection scheme for documents in local storage to bound the amount of data kept locally. The default threshold for collection is 100 MB of data on disk. The stack indicates that Firestore was trying to commit to LevelDB after completing a GC pass. It's very likely that the transaction had no changes though, because one of the first things the collector does is determine whether or not there's enough data to warrant collection. Most apps will never generate 100 MB of data so these commits will never be anything but empty. My initial instinct was that LevelDB might somehow be rejecting an empty commit, but of course this can't be true or we'd have 100k me-toos on this bug and it would have been filed immediately after we added GC in Firebase 5.14.0 back in December. A quick test verified that empty commits succeed so that's definitely not it. The error message itself indicates that LevelDB encountered an I/O error attempting to write one of its files. The specific error is Operation not permitted ( I'm not entirely sure about LevelDB's log numbering scheme, but I believe it's sequential. LevelDB failed to write to As far as reproduction goes, this scenario of writing an empty commit should be happening ~all the time already. GC is scheduled initially 1 minute after startup and then every 5 minutes thereafter. You could reduce the intervals by modifying their definition but at this point I'd suspect that the empty commit is more of a bystander than the cause of the issue. We could change the code to avoid actually committing when there are no changes though this would likely just move the issue elsewhere. Some questions for you:
Unfortunately, I don't have much more to go on. AFAICT this shouldn't be happening :-(. Regarding the points you raised:
This shouldn't matter. GC is definitely unrelated. There aren't any customization mechanisms that could implicate the other library.
That the profile of your interaction is write-mostly is interesting, but doesn't seem like it would matter for the purpose of this issue. Here we failed to run GC when we didn't actually do anything.
Size limits are enforced by the server at the point where we try to commit writes there. GC does not enforce or otherwise examine size limits. If LevelDB failed to accept a write for size reasons the stack would be much different. |
Well, glad I was able to provide something intellectually challenging. Always the goal ;-) You asked for issue prevalence and patterns: the quick take is this is very prevalent for us, not some idle corner case I'd like to fix. I'm unable to determine a pattern as we have a little skew on device and iOS version but despite percentage incidence being high the total count is low the extended take is that we have an MVP that we just launched Saturday. According to analytics we had
It is puzzling. In the analytics settings for the project I have enabled data sharing for support, so I believe you may look for patterns yourself in case it is useful and I further do not believe putting this id here compromises anything, so it is iOS app id 1:629722894643:ios:5fd8ca310ade9035 Last thought - this is the first time I have attempted to use firestore for anything, and the first time users have used it. So this is very fresh possibly naive API usage, and I may simply be doing something so bizarre no other developer would do it this way, as a possible explanation in case it's a fundamental problem that nevertheless didn't get 500k me-toos. If there is no path to move forward I could likely extract my API usage pattern and try to produce a tiny app that does a similar thing, but I would naturally like to avoid the effort if possible. Thanks for looking in to this! |
Oh - one more thing I can offer - the MVP launch is with a very controlled, motivated user population, aware they are working with a "pilot" quality app. They would be okay with installing test versions etc. We could add substantially more logging and attempt to attach it in some way to the crashlytics reports, or any other form of instrumentation, if it were useful. |
I don't actually have permission to view the analytics data directly. If you were to file a support case they might be able to, though I don't see any utility going that path at this point. However, I think errors in the background is big clue: it appears that when a device is encrypted (the default since iOS 8) and a device is locked, the file system becomes inaccessible and any attempts to write will return ... If so, the way to reproduce this will be to start your app and then ~45-50 seconds later lock your screen. For the next 20 seconds or so your app will continue to run in the background and the 1 minute initial GC pass will start and you'll crash. It's likely you'll need to do this on device. A workaround, if your data is not sensitive would be to mark the Firestore data directory non-protected, with code like this (in Swift): func unlockFirestoreData() throws {
let attributes = [FileAttributeKey.protectionKey: FileProtectionType.none]
let dirs = NSSearchPathForDirectoriesInDomains(.applicationSupportDirectory, .userDomainMask, true)
let firestoreDir = dirs[0].appending("firestore")
// If Firestore hasn't run yet on this client, create the directory for Firestore
// data and mark it unprotected.
if !FileManager.default.fileExists(atPath: firestoreDir, isDirectory: nil) {
try FileManager.default.createDirectory(atPath: firestoreDir, withIntermediateDirectories: true, attributes: attributes)
return
}
// If the directory does exist, then Firestore has run on this device previously
// so fix up any existing filesystem entries.
let files = FileManager.default.enumerator(atPath: firestoreDir)
while let file = files?.nextObject() as? String {
let fullPath = (firestoreDir as NSString).appendingPathComponent(file)
try FileManager.default.setAttributes(attributes, ofItemAtPath: fullPath)
}
} This will fix an existing instance or prepare a new instance. If your data is sensitive, then the alternative would be to shut Firestore down as you're going into the background, though this currently isn't well supported. Actually fixing this to tolerate background mode will require that Firestore listens for the For completeness, you can enable Firestore logging with one of these methods. A log of activity prior to the crash could be helpful, though it's likely voluminous. We don't integrate with crashlytics out of the box for a variety of uninteresting reasons, but it should be very straightforward to plug us in. https://docs.fabric.io/apple/crashlytics/enhanced-reports.html has details on what to do from that side. You'd change Firestore here: https://github.com/firebase/firebase-ios-sdk/blob/master/Firestore/core/src/firebase/firestore/util/log_apple.mm#L75, but substituting |
Interesting - I suppose I would expect more than 30% of the crashes to be in background if that was it, but I like the hunch and can certainly do some testing there. A quirk worth mentioning now (though maybe just confirmation bias) is that we did not get any actual user reports of this, which does smell like a quiet death in the background. App "Capabilities" (from XCode) are Location Updates (which we use), Background Fetch, and Remote Notifications, so the app should be alive periodically in the background but we're not using firebase APIs at the time so I hadn't thought background would matter but of course it could if GC just runs on a timer. Our information is sensitive so turning off encryption is a dead-end. I will see what I can do about instrumentation/logging as well as specific tests to trigger it so we can have some certain we've identified the thing Seems like I shouldn't uniquely trigger this if it's a filesystem lockout while in background. No other person handles silent push notifications from Firebase and then performs Firestore work that could trigger this? Speculatively thinking through available options assuming this is it -
I'll let you know results of the targeted test you propose, and/or any logging as I get it. Might be a day or three delay, I'm in the middle of integrating fastlane to split firebase environments used by build stream so build/packaging is heavily impaired at the moment... |
This is pretty surprising to me as well. We have had other users try to use Firestore in the background, but iOS kills sockets pretty early, so we advise disabling Firestore's network there. This is the first report of an I/O error due to filesystem protection (if this is the issue).
The supported way to shutdown Firestore is to delete the Firebase App instance (js, swift). Firestore responds to the internal event by doing nothing. We do have an unpublished API to do what's actually required. If you forward declare our shutdown method you'll be able to invoke it. Note that both are required. So yeah, not well supported.
#2862 makes it such that you can opt out of GC and have that actually disable the background process. You'll have unbounded cache growth while we work on making this better, but it will give you an effective remedy in the short term. |
I'm nearly done implementing Fastlane to support multiple release streams so I don't blow up my production firebase data with testing, which will allow me to work this actively but have no test results as of yet. I asked my primary iOS tester (we have a crash from him) but he has never noticed, so +1 for the "quiet death in background" confirmation bias count. Shutdown of the Firebase App instance sounds like a last resort, to me. Unfortunately delete is not supported by the react-native-firebase module. Does it have an equivalent android API? If so perhaps I could give them a PR for that one to correctly cover all wrapped API regardless of this issue. I reviewed the GC PR #2862 and it seems like I could start using unlimited cache now, then when the next firestore iOS version bump happened, I would get the upgraded behavior? With the caveat that it appears react-native-firebase (a completely unaffiliated project I understand, though friendly with you guys if I understand further) does not allow this setting through? I won't contemplate more until I've done real testing to make sure we on the right path. Thanks again for the attention on the issue |
This is difficult to reproduce and verify. I believe I reproduced it once with the screen lock / timing trick you suggested and received a firebase crash email (as expected given my user role config) with the expected code file (the assert one) but the crashlytics console did not display the error even though my crash-free percentage dropped, and it shows that there is one crash logged for a UUID that I manually uploaded the dSYM for so it should have been fine 🤷♂️ I'm going to count that as a confirm though, seeing the assert pop up with the expected test scenario, when I've never seen that assert otherwise in my app, seems about right. All the rest of this is based on the assumption my reproduction was valid and this is a data protection related issue My data protection is definitely set to complete now (aggressive perhaps, but I submit should be foreseen by a library that does not have control of library consumer habits). I can certainly understand why that is problematic for an SDK that is can be used to do synchronization in the background possibly, even if it weren't a nop GC. Seems like this might be more of a docs issue more than anything, at least as a workaround for now. To document that complete data protection at minimum will lead to background crash, and the other protection levels are as yet untested. Out of the various levels of data protection available it seems like this might describe them best?
In all cases leveldb should at minimum check to see if it took an EPERM because the files went away with a UIApplication.protectedDataExists check and produce some messaging that describes the issue. After thought and this reproduction I simply can't imagine how periodic GC wouldn't always fail on a "protection complete" app but I had to make that choice specifically, perhaps I really am just blazing a trail here and the type of person that toggles encryption just because it's available is rare I'm going to back down to complete until first authentication and hopefully that does the trick |
Tiny status report: my builds with provisioning profile entitlement / app config data protection diluted to "complete until first authentication" / "NSFileProtectionCompleteUntilFirstUserAuthentication" are starting to work their way through test on their way to deployment, and our MVP is about to go 4x on userbase so I should have more information soon. This setting is the default, so if it works that could explain rarity of crash vs obviousness of crash. Probably a week or two more |
Hi, we have also released an app (with flutter) and we could see the same crashes for some users. As far as I can tell, it happens only when the user has opened the app and locks the phone. If he unlocks the phone after a few minutes and the app was previously open, it crashes with the exception. But it does not always happen, but when it happens, these were the steps to reproduce.
|
@tecbot can you share what your data protection level is set to in the entitlements for your provisioning profile? is it NSFileProtectionComplete? Or one of the other choices? (or none) |
@mikehardy it's NSFileProtectionComplete |
The assertion is caused by background activity in Firestore. It assumes that when the user is idle it can perform some cleanup, but it's not prepared for filesystem protection to kick in. There are two possible workarounds:
The former is easier to do, but reduces functionality. The latter requires an as yet unreleased fix. To disable persistence, add the following to your code when first configuring Firestore: let settings = FirestoreSettings()
settings.isPersistenceEnabled = false
let firestore = Firestore.firestore()
firestore.settings = settings To disable garbage collection, add the following to your
Note that you may have other entries in there for Add the following setting to your code when first configuring Firestore: let settings = FirestoreSettings()
settings.cacheSizeBytes = FirestoreCacheSizeUnlimited
let firestore = Firestore.firestore()
firestore.settings = settings |
I believe there is a 3rd workaround - to disable Data Protection, and a possible 4th which is to simply loosen Data Protection so file access is available after first unlock. The 4th approach there is what I'm pursuing now but I don't have significant test results yet Am I wrong in thinking these other 2 paths are viable, if the app's security profile can tolerate them? |
Sorry, as you mentioned, notching back to Other options are possible. For example if other data in your application is sensitive but your Firestore data is not, you could disable protection for just Firestore files and retain The two workarounds cited above are for the case where you must retain |
Also, apologies for not responding to your earlier note. Things have been a bit hectic and I didn't have too much to add to your analysis. LevelDB, unfortunately, is agnostic to this kind of thing. File protection is incompatible with the kinds of guarantees it tries to make: namely that when a commit returns, data has been appending to the log and is available for querying. If file protection is enabled, a commit must fail.
The right fix for this for the SDK to subscribe to file protection events and either close its leveldb files or inhibit tasks that would access them. All of our internal processing is funneled through a dispatch queue, so an appealing course would be to insert a task into the queue that blocked until the filesystem was unlocked. This would have the benefit that you wouldn't even have to stop listening or stop writing--everything would queue in RAM. On the flip side I'm not sure this is a benefit. Anyway, at the very least, it does seem that we need to document that we're currently incompatible with |
@wilhuff @mikehardy this thread is super relevant to me on two different iOS apps using Firebase. &tldr: It is now obvious that the crashes I've been facing in two apps is incompatibility of Firebase with Data Protection with NSFileProtectionComplete. I had checked back in late March(?) just after releasing both apps, enabling Data Protection with NSFileProtectionComplete. That's when the crashing began, all in the GAI* code (see one such stack below). I did not see any posts about problems. As @mikehardy stated, this is very difficult to reproduce or verify any fixes outside of "production". I believe this is because iOS Data Protection doesn't get enabled in a Dev/Debug environment. Several weeks back, I tried to turn off the dispatcher with code like this in AppDelegate.swift:
This reduced the number of crashes in both apps significantly but still, the GAI* crashes continued. I was debating trying NSFileProtectionCompleteUnlessOpen as a second fix attempt but I believe the more reliable solution may be to try disable protection for just Firestore files - since our analytics data does not contain sensitive information at this time. FWIW, there were several variations of the GAIThread crash but here is one: Crashed: GAIThread |
@ferrerod The crashes you're seeing have nothing to do with FIrestore-- There is commonality in that file protection is involved, but the details are different and aside from the discussion of different file protection levels none of the workarounds discussed here apply. I suggest opening a separate issue for the GAI-related issues you're seeing. |
@wilhuff do you know where I might open an issue? There doesn't seem to be the same github.com for GoogleAnalytics |
The best resource I could find about this is their Getting Help page. I'm not aware of any other path. |
I now have about 40 users on my new release and 40 that were active on the older one over the last 7 days. Crash count is 12 crashes on the old release with this stack trace, 0 crashes on the new one. That was changing file protection to Cheers! |
That's great news. In that case, I'm going to resolve this issue by noting that Firestore is not currently compatible with I've opened #2980 to track the feature request of making it so we don't crash by default in this mode, though there's still not much utility to Firestore in this mode, since we can't persist any writes submitted while file protection is active. |
I've just read that changing the The issue is that the entitlement sets the default file protection when your application container is created on the user's device. Changing it subsequently won't change the file protection type of any existing files. That means that Firestore files created with the version of your app that was using If you're still seeing crashes, those users either need to delete and reinstall your app or you may need to apply a fix like the one I described above to force existing files to |
Very interesting...given the crash mode is a silent background one and no users reported it me personally, it was all crashlytics reports, and also given I personally (may not apply to other affected parties) was in the equivalent of an alpha release when we saw, and fixed prior to the equivalent of alpha2, I'm personally okay. But that's a good heads-up + pointer for any others, and knowledge I will file away also. |
do we have updates about this? |
I have no further crashes since my previous comment, when using NSFileProtectionCompleteUntilFirstUserAuthentication |
@mikehardy I have run into this issue as well but upon trying to change the NSProtection level I get an error from Xcode claiming that it can not update the Entitlements file. So I change it myself to the new protection level and then it comes back with another error saying that it is incorrectly set. Have you or anyone else experienced this? If so is there a way around this? I have tried it multiple times with no luck. I am using Xcode 10.3. |
entitlements are (from what I know) intertwined closely with the apps metadata stored in your developer account I believe you need to change them in the apple developer connection, then download the new entitlements file and use it. Manually changing that file is not something I think is possible |
@mikehardy correct but when doing that Xcode does not seem to update the entitlements from the changes made in portal. |
Hmm - I'm clearly out of my depth here - on re-read of the docs it seems like you need to change them in your developer connection account and in Xcode (separately?) but it's been so long since I did this I don't remember completely. Sorry. My related commit message when I conformed to the background encryption setting needs here was even more ominous, as it mentions you need to update your provisioning profile as well? For me that spiders out into fastlane proviosioning profile twiddling which is quite the yak shave. All vague, but maybe it gets you pointed in the right direction.
|
I am not sure if Firestore generates any files, but if it does, you could always change the attributes of those files to no protection and that should fix the crash too. I ran into a very similar issue, I opened a radar on Apple today since its clearly an iOS 13 bug in my case:
We reproduced this crash with our enterprise application and also with a demo app with no code in it. As par of our troubleshooting, we realized that dyld3 is storing .closure files in the /tmp folder of the application. If we change the attributes of those files to We believe there is some sort of issue with dyld3 trying to access those files in the /tmp directory when the device is locked, and it causes the crash. Just to be clear, to change the attributes of those files, you most likely will also need to change the posix permissions to 0o777 since most likely they are set to immutable and changing protection permissions won't have any effect. |
@pkrmf This is essentially the advice I gave in #2846 (comment). Unfortunately, as a library we can't really make this choice for developers using Firestore. If the data really is sensitive, Firestore auto-unlocking its database files will work counter to developer intent. Since this issue occurred we have done a few things to make it easier to use Firestore in an app that must use |
[READ] Step 1: Are you in the right place?
file a Github issue.
with the firebase tag.
google group.
of the above categories, reach out to the personalized
Firebase support channel.
[REQUIRED] Step 2: Describe your environment
[REQUIRED] Step 3: Describe the problem
App crashes (as detected by Crashlytics). Apparently while in garbage collection. I have a stack trace but it doesn't make much sense as it appears to be collection 0 garbage, and fails as operation not permitted on a file. The error and stack trace are likely the most useful items:
error:
stack:
Steps to reproduce:
I would love to have steps to reproduce but I haven't seen this locally, and don't have the logs or discussion of steps from relevant users / devices
Any instinct on how this could happen that could help me pinpoint it would be welcome
Relevant Code:
Unsure as this appears to be an internal routine. The relevant file appears to have last been edited by Grand Moff Harkin long ago and I'm not familiar enough with Firestore internals to make any guess as to possible modes of failure in a GC.
I may have missed them but I was similarly unable to find any open or closed issues in this repo that seemed relevant though I did try.
The text was updated successfully, but these errors were encountered: