orjson: random crashes after upgrade to 3.9.12
This is from system dmesg output:
[Fri Jan 19 10:41:06 2024] python3[3421008]: segfault at 7fe28bd24000 ip 00007fe296824bde sp 00007ffdd5db46f8 error 4 in orjson.cpython-312-x86_64-linux-gnu.so[7fe2967fe000+2f000] [Fri Jan 19 10:41:06 2024] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 4c 01 c0 4c 01 c6 49 f7 d0 4c 01 c2 4c 89 10 4c 01 c8 48 ff c6 48 85 d2 0f 84 dd 02 00 00 <c5> fe 6f 1e c5 fe 7f 18 c5 e5 74 e0 c5 e5 74 e9 c5 d5 eb e4 c5 e5
Not sure if other people encounter similar issues.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Reactions: 21
- Comments: 16 (4 by maintainers)
Commits related to this issue
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 — committed to alexmv/zulip by alexmv 5 months ago
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 — committed to zulip/zulip by alexmv 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to andersk/orjson by andersk 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to ijl/orjson by andersk 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to ijl/orjson by andersk 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to ijl/orjson by andersk 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to ijl/orjson by andersk 5 months ago
- Fix buffer overread in format_escaped_str Fixes #452, probably. Signed-off-by: Anders Kaseorg <andersk@mit.edu> — committed to andersk/orjson by andersk 5 months ago
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 (cherry picked from commit 437361de327b... — committed to andersk/zulip by alexmv 5 months ago
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 (cherry picked from commit 437361de327b... — committed to zulip/zulip by alexmv 5 months ago
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 — committed to amaranand360/zulip by alexmv 5 months ago
- requirements: Revert orjson upgrade due to segfault. Version 3.9.11 and 3.9.12 are susceptible to random segfaults: - https://github.com/ijl/orjson/issues/452 — committed to mananbordia/zulip by alexmv 5 months ago
I see that in 528220fb0d18bbf0212de7f0ce5c7aec209bc6e7 you’ve added a check for whether the pointer crosses a page boundary and reinstated the buffer overread if it doesn’t. But a buffer overread is undefined behavior whether or not a page boundary is crossed. Valgrind still flags the same error with my above test case in 3.9.14.
Undefined behavior will cause problems eventually, even if the symptom isn’t as obvious as a segfault, and it might seem like it’s working until there’s a clever new compiler optimization relying on an incorrect invariant inferred from the contract that the program has broken. We need to avoid all UB, not just paper over its observed symptoms.
Moreover, it can’t possibly be saving significant time here, given this code is only there for handling the end of the buffer.
Please fully remove the buffer overread.
A test case that doesn’t segfault but makes Valgrind angry:
I suspect 3.9.13 reduced the probability of the issue since 58a8bd3e31aa3b5fd3d962fb5b03479fa0014ee9 decreased the maximum overread from 31 bytes to 15 bytes, but it’s not eliminated. The Valgrind trace I posted above is from 3.9.13.