Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode escape write #812

Open
peroket opened this issue Mar 8, 2024 · 5 comments
Open

Unicode escape write #812

peroket opened this issue Mar 8, 2024 · 5 comments
Labels
enhancement New feature or request

Comments

@peroket
Copy link

peroket commented Mar 8, 2024

When having string with unicode values and serialising them those unicode values should be escaped.
For example, if in c++ I write

auto result = glz::write_json("\x1f");

I would expect the result string to actually be "\u001f". glaze does not seem to do any conversions. Is there a way to be conform that I missed?

@stephenberry
Copy link
Owner

Thanks for brining this up. There are a few issues in glaze dealing with improved unicode handling, so it's certainly necessary, but I'm not sure when the full support will be added. Will try to add support sooner than later, or I would be happy to accept a pull request.

@stephenberry stephenberry added the bug Something isn't working label Mar 8, 2024
@stephenberry
Copy link
Owner

stephenberry commented Mar 19, 2024

Forcing escape checks and conversions on everyone I think is a bad idea, because it results in a significant performance loss when writing strings.

I think this should be opt-in, with a global compile time value, and a wrapper for using on specific strings.

I tend to agree with this discussion that escaping these characters is a fault in the specification: https://softwaremaniacs.org/blog/2015/03/22/json-encoding-problem/en/

However, I still think it must be supported, but opt-in, as described above. It is best to by default conform to the I-JSON RFC.

@stephenberry stephenberry added enhancement New feature or request and removed bug Something isn't working labels Mar 19, 2024
@stephenberry
Copy link
Owner

The only characters that need to be escaped, per the specification, are the control values (less than 32) which do not have shorthands.

The original example of "\x1f" would need to be escaped as JSON escaped unicode.

My plan is to add a compile time option and wrappers to support this behavior, but not to support it by default. A major reason is that escaped control values allow embedding null characters into strings, which can create ugly bugs when working with algorithms that use null termination.

There would also be a major performance hit to generally escape these control characters that cannot even be displayed as text.

However, this is a valid corner of JSON that will be supported. It will jut be opt-in.

@pauldreik
Copy link
Contributor

I think the current behaviour is a good choice. But it would be great if it could be mentioned under https://github.com/stephenberry/glaze/tree/main?tab=readme-ov-file#json-conformance ? Perhaps with a link to this ticket.

@stephenberry
Copy link
Owner

Thanks for the suggestion, I mentioned this under json-conformance and linked this ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
3 participants