FNextPushId 'successor' crash #8790

mortenbekditlevsen · 2021-10-12T11:17:22Z

Step 0: Are you in the right place?

For issues or feature requests related to the code in this repository
file a Github issue.
- If this is a feature request please use the Feature Request template.
For general technical questions, post a question on StackOverflow
with the firebase tag.
For general (non-iOS) Firebase discussion, use the firebase-talk
google group.
For backend issues, console issues, and other non-SDK help that does not fall under one
of the above categories, reach out to
Firebase Support.
Once you've read this section and determined that your issue is appropriate for
this repository, please delete this section.

[REQUIRED] Step 1: Describe your environment

Xcode version: 13
Firebase SDK version: master
Installation method: Swift Package Manager
Firebase Component: Database

[REQUIRED] Step 2: Describe the problem

The methods successor and predecessor erronously deal with PUSH_CHARS even though it's entirely valid to use unicode keys that are not within the PUSH_CHARS set.

Around line 98 in FNextPushId.m we have:

    NSInteger sourceIndex = [PUSH_CHARS rangeOfString:source].location;
    NSString *sourcePlusOne = [NSString
        stringWithFormat:@"%C", [PUSH_CHARS characterAtIndex:sourceIndex + 1]];

The issue is that if source is for instance a space, then sourceIndex will be max int and the sourcePlusOne calculation causes a crash (since max int + 1 gives min int in objective-c, and min int is out of bounds).

Thread 1: "-[__NSCFConstantString characterAtIndex:]: Range or index out of bounds"

Steps to reproduce:

Easily reproducible with the one-liner below:

Relevant Code:

        let ref = Database.database().reference().queryOrderedByKey().queryStarting(afterValue: " ".padding(toLength: 786, withPad: "z", startingAt: 0))

Suggestion for a solution

I guess that 'successor' and 'predecessor' ought to deal with unicode character points instead of the PUSH_CHARs used to auto-generate keys.
In the same line, I believe that appending MIN_PUSH_CHAR to keys that are shorter than MAX_KEY_LENGTH is not entirely correct, since it wont give the string value that immediately succeeds the input key when using the key index comparison - - which for non-numeric keys ends up using the following comparison:

            return [a compare:b options:NSLiteralSearch];

So 'successor' should return the key that immediately follows the input as defined by the +compareKey:toKey method in FUtilities.

The crash itself is a bit contrived, but here is an actual issue:
If you have the following data in RTDB:
{ "a": "hello",
"a!": "world"
}
and do a queryStarting(afterValue: "a"), then I suspect that the query will start at "a-" meaning that it will miss the "a!" entry...

The text was updated successfully, but these errors were encountered:

mortenbekditlevsen · 2021-10-13T06:50:14Z

Note: The Objective-C implementation is similar to the java implementation for android. I assume the same error may exist in other implementations of the sdk as well:

https://github.com/firebase/firebase-android-sdk/blob/52e1cf89efd795d798ae5485b8503e04e4a936b8/firebase-database/src/main/java/com/google/firebase/database/core/utilities/PushIdGenerator.java#L98

paulb777 · 2021-10-13T19:12:34Z

@mortenbekditlevsen Thanks for the report. I can reproduce.

@schmidt-sebastian Do you know why FNextPushId only considers alphanumeric ASCII characters when all unicode characters are valid for a key?

mortenbekditlevsen · 2021-10-14T06:29:16Z

Just to comment on the question above: For the auto-generated keys it's of course perfectly fine with the restriction of the 64 ASCII characters - it's only the successor and predecessor methods that get into trouble because they don't consider that the keys they get as input - or in fact the keys at the reference where the query is made may contain other characters too.

I would like to help with a solution, but it ties closely into the concept of sorting. I can see that the key sorting implemented on iOS uses NSString literal sort, which I guess compares utf16 representations of the unicode code points. Is this in fact how the server sorts keys as well?
The algorithm does become quite a bit more complex if the full unicode table is valid - and padding has to take number of utf8 characters into account too.
By experimentation it seems like '0x20' (space) is the minimum allowed character, and it appears that 0x10FFFF is valid too. Could that be the highest code allowed code point?

It would be really excellent if the server side key sorting algorithm could be shared, so that a 100% robust version of successor and predecessor could be implemented - and perhaps even the key sorting implementation on iOS be tweaked as well.

mortenbekditlevsen · 2021-10-14T07:07:13Z

I'm currently performing an experiment, trying to write values for keys of every unicode code point from 0x00 to 0x10FFFF. Using this to get a better feeling for allowed characters and later I'll start experiments to test how the server performs sorting.

schmidt-sebastian · 2021-10-14T15:16:49Z

Using the limited ASCII space keeps the PushIDs readable, which simplifies their usage in the Console. I don't think this is a fixed requirement.

@IanWyszynski Is this something you can look at?

mortenbekditlevsen · 2021-10-14T16:30:05Z

Please note that this is not an issue with push ids. It's an issue with assuming that all keys are push ids.

schmidt-sebastian · 2021-10-14T18:09:30Z

Yeah, I assume this is due to the successor logic in startAfter.

jmwski · 2021-10-14T19:46:58Z

It looks like the issue is that we're not considering all valid utf8 characters that can be included in keys.

If you create your own keys, they must be UTF-8 encoded, can be a maximum of 768 bytes, and cannot contain ., $, #, [, ], /, or ASCII control characters 0-31 or 127. You cannot use ASCII control characters in the values themselves, either.

https://firebase.google.com/docs/database/ios/structure-data#how_data_is_structured_its_a_json_tree

FWIW, the server sort function is identical to compareKey. The key index is used as a fallback for other ordering indexes to break ties, but when using orderByKey we need a way to compute the next possible key. I'm not sure exactly how this works for utf8 strings, but I'll investigate.

mortenbekditlevsen · 2021-10-15T10:13:29Z

Thanks for the clarification, Ian. I thought that it might be a little peculiar that the server also used utf16 'character' comparison like in Objective-C - which in my testing does not give the same ordering as unicode codepoint order.
But I tested with a few values, and from those tests it appears that the server is indeed also doing a utf16 character ordering.

My tests were:
Find a sample where utf16 ordering is different from unicode code point order. One such is unicode E000 and 10FFFF in utf16 character ordering 10FFFF comes first of those two, while it's the other way around in utf8 and unicode code point order.

I then tested with adding those strings as keys in the rtdb and performed queries limiting to first - and this indeed also gave unicode 10FFFF as the the 'first' element.

Ok, so I guess that gives us some of the points we need for modifying the successor and predecessor algorithms.

In rtdb, the following unicode code points are invalid to use (found by testing):
0x00 - 0x1F are invalid key values
0x23, 0x24, 0x2E, 0x5B, 0x5D are explicitly disallowed (.#$[])
0x2F (/)is not invalid as a path character, but can't be used as a key
0x7F is an invalid key value
0xD800 - 0xDFFF are invalid unicode scalars as they are reserved for the surrogate pairs used in utf16 encoding.

So the lowest allowed value is 0x20 - space
The highest (uft16) value is 0xFFFF - any unicode code point larger or equal to 0x10000 will sort lower than 0xFFFF as they encode as a surrogate pair in the range of 0xd800 and 0xdfff.

Getting the 'plus one' value needs to skip all disallowed ranges mentioned above. And plus one will be different based on whether the code point encodes as one or two utf16 characters. If it encodes as two, then I guess 0x10FFFF + 1 will be 0xE000??

Getting the minus one is similarly complicated.

Even more complicated is the fact that the transport (and thus max key length) is counted in utf8 bytes, but the ordering is in utf16. For instance, I can't write a key of unicode codepoint FFFF repeated 768/2 times as FFFF takes three utf8 bytes, but only one utf16 character (corresponding to two bytes). This means that it's quite hard to calculate any string padding when calculating the predecessor.

I don't know if you can use these considerations for anything, but I guess that my point is: This will be really, really hard to get correct. I'd perhaps even go so far as to say: In order to get this correct you might actually need a dedicated query type supported by the server, so that the server can do the successor/predecessor logic - or even just query from the supplied key but skip values with that key in the response.

Final comment is: Help, during my testing I added keys of length 768 to the database and I can't get rid of them trough the console.
Also: is 768 the limit for the entire path? it appears so. Should that also be taken into account in predecessor/successor?

mortenbekditlevsen · 2021-10-15T10:21:36Z

Another small comment - apologies for the bandwidth here - I'm just really curious about how things work:

In the javascript sdk we have:
https://github.com/firebase/firebase-js-sdk/blob/master/packages/database/src/core/util/util.ts

Where the key comparison falls back to the 'less than' operator, which in the documentation states that it performs unicode codepoint comparison - meaning that this may differ from server implementation and also iOS.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Less_than

~~If both values are strings, they are compared as strings, based on the values of the Unicode code points they contain.~~

Update: in nodejs at least, the ordering also appears to be utf16 character ordering, so I assume that the documentation referred above is not entirely correct.

> "\u10FFFF" < "\uE000"
true

mortenbekditlevsen · 2021-10-15T10:37:54Z

Ah, another finding:
The queryStartingAt API doesn't seem to mind start values that are longer than 768 bytes even though it's a 'queryorderedbykey' query.
So this means that the whole utf8 vs. utf16 issue goes away.
Uuh, I actually feel like giving the implementation a shot now! :-)

mortenbekditlevsen · 2021-10-15T14:58:28Z

Update: I have a shot at an implementation. Will create a pr later this weekend or on Monday.

mortenbekditlevsen · 2021-10-27T09:11:14Z

I referenced this issue from firebase-js-sdk and firebase-android-sdk. Couldn't find any ditto in unity or cpp sdks.
Please let me know if there are other implementations to be aware of.

@IanWyszynski

paulb777 · 2022-01-11T00:18:13Z

Fixed by #8817

google-oss-bot added the api: database label Oct 12, 2021

schmidt-sebastian assigned jmwski Oct 14, 2021

mortenbekditlevsen mentioned this issue Oct 16, 2021

Attempt at fixing issue with getting next and previous push id #8817

Merged

This was referenced Oct 27, 2021

Incorrect assumption in successor and predecessor functions firebase/firebase-js-sdk#5670

Closed

Incorrect assumption in successor and predecessor methods. firebase/firebase-android-sdk#3079

Open

paulb777 added this to the 8.11.0 - M110 milestone Jan 11, 2022

paulb777 closed this as completed Jan 11, 2022

firebase locked and limited conversation to collaborators Feb 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FNextPushId 'successor' crash #8790

FNextPushId 'successor' crash #8790

mortenbekditlevsen commented Oct 12, 2021 •

edited

Loading

mortenbekditlevsen commented Oct 13, 2021

paulb777 commented Oct 13, 2021

mortenbekditlevsen commented Oct 14, 2021

mortenbekditlevsen commented Oct 14, 2021

schmidt-sebastian commented Oct 14, 2021

mortenbekditlevsen commented Oct 14, 2021

schmidt-sebastian commented Oct 14, 2021

jmwski commented Oct 14, 2021

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 15, 2021 •

edited

Loading

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 27, 2021

paulb777 commented Jan 11, 2022

FNextPushId 'successor' crash #8790

FNextPushId 'successor' crash #8790

Comments

mortenbekditlevsen commented Oct 12, 2021 • edited Loading

Step 0: Are you in the right place?

[REQUIRED] Step 1: Describe your environment

[REQUIRED] Step 2: Describe the problem

Steps to reproduce:

Relevant Code:

Suggestion for a solution

mortenbekditlevsen commented Oct 13, 2021

paulb777 commented Oct 13, 2021

mortenbekditlevsen commented Oct 14, 2021

mortenbekditlevsen commented Oct 14, 2021

schmidt-sebastian commented Oct 14, 2021

mortenbekditlevsen commented Oct 14, 2021

schmidt-sebastian commented Oct 14, 2021

jmwski commented Oct 14, 2021

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 15, 2021 • edited Loading

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 15, 2021

mortenbekditlevsen commented Oct 27, 2021

paulb777 commented Jan 11, 2022

mortenbekditlevsen commented Oct 12, 2021 •

edited

Loading

mortenbekditlevsen commented Oct 15, 2021 •

edited

Loading