oj: Inconsistent behavior for illegal/malformed utf-8
There is an inconsistency when we call Oj.optimize_rails and use JSON.generate when there are invalid utf-8 characters in a string:
irb(main):001:0> str = "\xAE"
=> "\xAE"
irb(main):002:0> JSON.generate(str)
Traceback (most recent call last):
1: from (irb):2
JSON::GeneratorError (source sequence is illegal/malformed utf-8)
irb(main):003:0> Oj.optimize_rails
=> nil
irb(main):004:0> JSON.generate(str)
=> "\"\xAE\""
I’d expect the same JSON::GeneratorError to be raised after the Oj patch is applied.
Oj version: 3.9.1 Ruby 2.6.5p114
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (8 by maintainers)
Not on my side. I’m going to close this one. Huge thank you for your help and for building & maintaining this great gem @ohler55! 🙏
Oj always uses UTF-8 encoding. JSON text is defined as a sequence of Unicode code points. ASCII is not a valid encoding since it does not support Unicode. UTF-8 does support Unicode and is the suggested encoding for JSON. Are you seeing the JSON gem emitting ASCII-8BIT strings? That seems broken but if that is what the JSON gem is emitting Oj in compat and rails mode will need to be updated.