Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FNextPushId 'successor' crash #8790

Closed
mortenbekditlevsen opened this issue Oct 12, 2021 · 14 comments
Closed

FNextPushId 'successor' crash #8790

mortenbekditlevsen opened this issue Oct 12, 2021 · 14 comments
Assignees
Milestone

Comments

@mortenbekditlevsen
Copy link
Contributor

mortenbekditlevsen commented Oct 12, 2021

Step 0: Are you in the right place?

  • For issues or feature requests related to the code in this repository
    file a Github issue.
    • If this is a feature request please use the Feature Request template.
  • For general technical questions, post a question on StackOverflow
    with the firebase tag.
  • For general (non-iOS) Firebase discussion, use the firebase-talk
    google group.
  • For backend issues, console issues, and other non-SDK help that does not fall under one
    of the above categories, reach out to
    Firebase Support.
  • Once you've read this section and determined that your issue is appropriate for
    this repository, please delete this section.

[REQUIRED] Step 1: Describe your environment

  • Xcode version: 13
  • Firebase SDK version: master
  • Installation method: Swift Package Manager
  • Firebase Component: Database

[REQUIRED] Step 2: Describe the problem

The methods successor and predecessor erronously deal with PUSH_CHARS even though it's entirely valid to use unicode keys that are not within the PUSH_CHARS set.

Around line 98 in FNextPushId.m we have:

    NSInteger sourceIndex = [PUSH_CHARS rangeOfString:source].location;
    NSString *sourcePlusOne = [NSString
        stringWithFormat:@"%C", [PUSH_CHARS characterAtIndex:sourceIndex + 1]];

The issue is that if source is for instance a space, then sourceIndex will be max int and the sourcePlusOne calculation causes a crash (since max int + 1 gives min int in objective-c, and min int is out of bounds).

Thread 1: "-[__NSCFConstantString characterAtIndex:]: Range or index out of bounds"

Steps to reproduce:

Easily reproducible with the one-liner below:

Relevant Code:

        let ref = Database.database().reference().queryOrderedByKey().queryStarting(afterValue: " ".padding(toLength: 786, withPad: "z", startingAt: 0))

Suggestion for a solution

I guess that 'successor' and 'predecessor' ought to deal with unicode character points instead of the PUSH_CHARs used to auto-generate keys.
In the same line, I believe that appending MIN_PUSH_CHAR to keys that are shorter than MAX_KEY_LENGTH is not entirely correct, since it wont give the string value that immediately succeeds the input key when using the key index comparison - - which for non-numeric keys ends up using the following comparison:

            return [a compare:b options:NSLiteralSearch];

So 'successor' should return the key that immediately follows the input as defined by the +compareKey:toKey method in FUtilities.

The crash itself is a bit contrived, but here is an actual issue:
If you have the following data in RTDB:
{ "a": "hello",
"a!": "world"
}
and do a queryStarting(afterValue: "a"), then I suspect that the query will start at "a-" meaning that it will miss the "a!" entry...

@mortenbekditlevsen
Copy link
Contributor Author

Note: The Objective-C implementation is similar to the java implementation for android. I assume the same error may exist in other implementations of the sdk as well:

https://github.com/firebase/firebase-android-sdk/blob/52e1cf89efd795d798ae5485b8503e04e4a936b8/firebase-database/src/main/java/com/google/firebase/database/core/utilities/PushIdGenerator.java#L98

@paulb777
Copy link
Member

@mortenbekditlevsen Thanks for the report. I can reproduce.

@schmidt-sebastian Do you know why FNextPushId only considers alphanumeric ASCII characters when all unicode characters are valid for a key?

@mortenbekditlevsen
Copy link
Contributor Author

Just to comment on the question above: For the auto-generated keys it's of course perfectly fine with the restriction of the 64 ASCII characters - it's only the successor and predecessor methods that get into trouble because they don't consider that the keys they get as input - or in fact the keys at the reference where the query is made may contain other characters too.

I would like to help with a solution, but it ties closely into the concept of sorting. I can see that the key sorting implemented on iOS uses NSString literal sort, which I guess compares utf16 representations of the unicode code points. Is this in fact how the server sorts keys as well?
The algorithm does become quite a bit more complex if the full unicode table is valid - and padding has to take number of utf8 characters into account too.
By experimentation it seems like '0x20' (space) is the minimum allowed character, and it appears that 0x10FFFF is valid too. Could that be the highest code allowed code point?

It would be really excellent if the server side key sorting algorithm could be shared, so that a 100% robust version of successor and predecessor could be implemented - and perhaps even the key sorting implementation on iOS be tweaked as well.

@mortenbekditlevsen
Copy link
Contributor Author

I'm currently performing an experiment, trying to write values for keys of every unicode code point from 0x00 to 0x10FFFF. Using this to get a better feeling for allowed characters and later I'll start experiments to test how the server performs sorting.

@schmidt-sebastian
Copy link
Contributor

Using the limited ASCII space keeps the PushIDs readable, which simplifies their usage in the Console. I don't think this is a fixed requirement.

@IanWyszynski Is this something you can look at?

@mortenbekditlevsen
Copy link
Contributor Author

Please note that this is not an issue with push ids. It's an issue with assuming that all keys are push ids.

@schmidt-sebastian
Copy link
Contributor

Yeah, I assume this is due to the successor logic in startAfter.

@jmwski
Copy link
Contributor

jmwski commented Oct 14, 2021

It looks like the issue is that we're not considering all valid utf8 characters that can be included in keys.

If you create your own keys, they must be UTF-8 encoded, can be a maximum of 768 bytes, and cannot contain ., $, #, [, ], /, or ASCII control characters 0-31 or 127. You cannot use ASCII control characters in the values themselves, either.

https://firebase.google.com/docs/database/ios/structure-data#how_data_is_structured_its_a_json_tree

FWIW, the server sort function is identical to compareKey. The key index is used as a fallback for other ordering indexes to break ties, but when using orderByKey we need a way to compute the next possible key. I'm not sure exactly how this works for utf8 strings, but I'll investigate.

@mortenbekditlevsen
Copy link
Contributor Author

Thanks for the clarification, Ian. I thought that it might be a little peculiar that the server also used utf16 'character' comparison like in Objective-C - which in my testing does not give the same ordering as unicode codepoint order.
But I tested with a few values, and from those tests it appears that the server is indeed also doing a utf16 character ordering.

My tests were:
Find a sample where utf16 ordering is different from unicode code point order. One such is unicode E000 and 10FFFF in utf16 character ordering 10FFFF comes first of those two, while it's the other way around in utf8 and unicode code point order.

I then tested with adding those strings as keys in the rtdb and performed queries limiting to first - and this indeed also gave unicode 10FFFF as the the 'first' element.

Ok, so I guess that gives us some of the points we need for modifying the successor and predecessor algorithms.

In rtdb, the following unicode code points are invalid to use (found by testing):
0x00 - 0x1F are invalid key values
0x23, 0x24, 0x2E, 0x5B, 0x5D are explicitly disallowed (.#$[])
0x2F (/)is not invalid as a path character, but can't be used as a key
0x7F is an invalid key value
0xD800 - 0xDFFF are invalid unicode scalars as they are reserved for the surrogate pairs used in utf16 encoding.

So the lowest allowed value is 0x20 - space
The highest (uft16) value is 0xFFFF - any unicode code point larger or equal to 0x10000 will sort lower than 0xFFFF as they encode as a surrogate pair in the range of 0xd800 and 0xdfff.

Getting the 'plus one' value needs to skip all disallowed ranges mentioned above. And plus one will be different based on whether the code point encodes as one or two utf16 characters. If it encodes as two, then I guess 0x10FFFF + 1 will be 0xE000??

Getting the minus one is similarly complicated.

Even more complicated is the fact that the transport (and thus max key length) is counted in utf8 bytes, but the ordering is in utf16. For instance, I can't write a key of unicode codepoint FFFF repeated 768/2 times as FFFF takes three utf8 bytes, but only one utf16 character (corresponding to two bytes). This means that it's quite hard to calculate any string padding when calculating the predecessor.

I don't know if you can use these considerations for anything, but I guess that my point is: This will be really, really hard to get correct. I'd perhaps even go so far as to say: In order to get this correct you might actually need a dedicated query type supported by the server, so that the server can do the successor/predecessor logic - or even just query from the supplied key but skip values with that key in the response.

Final comment is: Help, during my testing I added keys of length 768 to the database and I can't get rid of them trough the console.
Also: is 768 the limit for the entire path? it appears so. Should that also be taken into account in predecessor/successor?

@mortenbekditlevsen
Copy link
Contributor Author

mortenbekditlevsen commented Oct 15, 2021

Another small comment - apologies for the bandwidth here - I'm just really curious about how things work:

In the javascript sdk we have:
https://github.com/firebase/firebase-js-sdk/blob/master/packages/database/src/core/util/util.ts

Where the key comparison falls back to the 'less than' operator, which in the documentation states that it performs unicode codepoint comparison - meaning that this may differ from server implementation and also iOS.

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Operators/Less_than

If both values are strings, they are compared as strings, based on the values of the Unicode code points they contain.

Update: in nodejs at least, the ordering also appears to be utf16 character ordering, so I assume that the documentation referred above is not entirely correct.

> "\u10FFFF" < "\uE000"
true
@mortenbekditlevsen
Copy link
Contributor Author

Ah, another finding:
The queryStartingAt API doesn't seem to mind start values that are longer than 768 bytes even though it's a 'queryorderedbykey' query.
So this means that the whole utf8 vs. utf16 issue goes away.
Uuh, I actually feel like giving the implementation a shot now! :-)

@mortenbekditlevsen
Copy link
Contributor Author

Update: I have a shot at an implementation. Will create a pr later this weekend or on Monday.

@mortenbekditlevsen
Copy link
Contributor Author

I referenced this issue from firebase-js-sdk and firebase-android-sdk. Couldn't find any ditto in unity or cpp sdks.
Please let me know if there are other implementations to be aware of.

@IanWyszynski

@paulb777 paulb777 added this to the 8.11.0 - M110 milestone Jan 11, 2022
@paulb777
Copy link
Member

Fixed by #8817

@firebase firebase locked and limited conversation to collaborators Feb 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
5 participants