orjson: random crashes after upgrade to 3.9.12

This is from system dmesg output:

[Fri Jan 19 10:41:06 2024] python3[3421008]: segfault at 7fe28bd24000 ip 00007fe296824bde sp 00007ffdd5db46f8 error 4 in orjson.cpython-312-x86_64-linux-gnu.so[7fe2967fe000+2f000] [Fri Jan 19 10:41:06 2024] Code: 66 66 66 2e 0f 1f 84 00 00 00 00 00 4c 01 c0 4c 01 c6 49 f7 d0 4c 01 c2 4c 89 10 4c 01 c8 48 ff c6 48 85 d2 0f 84 dd 02 00 00 <c5> fe 6f 1e c5 fe 7f 18 c5 e5 74 e0 c5 e5 74 e9 c5 d5 eb e4 c5 e5

Not sure if other people encounter similar issues.

About this issue

  • Original URL
  • State: closed
  • Created 5 months ago
  • Reactions: 21
  • Comments: 16 (4 by maintainers)

Commits related to this issue

Most upvoted comments

I see that in 528220fb0d18bbf0212de7f0ce5c7aec209bc6e7 you’ve added a check for whether the pointer crosses a page boundary and reinstated the buffer overread if it doesn’t. But a buffer overread is undefined behavior whether or not a page boundary is crossed. Valgrind still flags the same error with my above test case in 3.9.14.

Undefined behavior will cause problems eventually, even if the symptom isn’t as obvious as a segfault, and it might seem like it’s working until there’s a clever new compiler optimization relying on an incorrect invariant inferred from the contract that the program has broken. We need to avoid all UB, not just paper over its observed symptoms.

Moreover, it can’t possibly be saving significant time here, given this code is only there for handling the end of the buffer.

Please fully remove the buffer overread.

A test case that doesn’t segfault but makes Valgrind angry:

$ valgrind python -c 'import orjson; orjson.dumps((b"\n" + b"x" * 4046).decode())'
==50092== Memcheck, a memory error detector
==50092== Copyright (C) 2002-2022, and GNU GPL'd, by Julian Seward et al.
==50092== Using Valgrind-3.22.0 and LibVEX; rerun with -h for copyright info
==50092== Command: python -c import\ orjson;\ orjson.dumps((b"\\n"\ +\ b"x"\ *\ 4046).decode())
==50092== 
==50092== Invalid read of size 16
==50092==    at 0x12DAA988: orjson::serialize::writer::simd::format_escaped_str_impl_128 (simd.rs:0)
==50092==    by 0x12DA85C9: format_escaped_str<&mut orjson::serialize::writer::byteswriter::BytesWriter> (json.rs:578)
==50092==    by 0x12DA85C9: serialize_str<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::writer::formatter::CompactFormatter> (json.rs:165)
==50092==    by 0x12DA85C9: <orjson::serialize::per_type::unicode::StrSerializer as serde::ser::Serialize>::serialize (unicode.rs:29)
==50092==    by 0x12DACB7A: to_writer<&mut orjson::serialize::writer::byteswriter::BytesWriter, orjson::serialize::serializer::PyObjectSerializer> (json.rs:605)
==50092==    by 0x12DACB7A: serialize (serializer.rs:25)
==50092==    by 0x12DACB7A: dumps (lib.rs:354)
==50092==    by 0x49BC251: cfunction_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4FB50CD: (below main) (in /nix/store/7jiqcrg061xi5clniy7z5pvkc4jiaqav-glibc-2.38-27/lib/libc.so.6)
==50092==  Address 0x13e203a1 is 4,081 bytes inside a block of size 4,096 alloc'd
==50092==    at 0x484276B: malloc (in /nix/store/1iai1iry6zw0fn4b2rnb93yx4vgpd9bi-valgrind-3.22.0/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==50092==    by 0x4981DBF: _PyObject_Malloc (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x49EF4DB: PyUnicode_New.part.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x49B0DDF: unicode_decode_utf8 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4A981F1: method_vectorcall_FASTCALL_KEYWORDS (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4AB22C2: PyObject_Vectorcall (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x494D8DC: _PyEval_EvalFrameDefault (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B8256B: _PyEval_Vector.constprop.0 (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4B82709: PyEval_EvalCode (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BAD42F: run_mod (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE6FC: PyRun_SimpleStringFlags (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092==    by 0x4BCE9B4: Py_RunMain (in /nix/store/w4fvvhkzb0ssv0fw5j34pw09f0qw84w8-python3-3.11.7/lib/libpython3.11.so.1.0)
==50092== 
==50092== 
==50092== HEAP SUMMARY:
==50092==     in use at exit: 620,813 bytes in 215 blocks
==50092==   total heap usage: 6,016 allocs, 5,801 frees, 10,140,991 bytes allocated
==50092== 
==50092== LEAK SUMMARY:
==50092==    definitely lost: 0 bytes in 0 blocks
==50092==    indirectly lost: 0 bytes in 0 blocks
==50092==      possibly lost: 0 bytes in 0 blocks
==50092==    still reachable: 620,813 bytes in 215 blocks
==50092==         suppressed: 0 bytes in 0 blocks
==50092== Rerun with --leak-check=full to see details of leaked memory
==50092== 
==50092== For lists of detected and suppressed errors, rerun with: -s
==50092== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0)

I suspect 3.9.13 reduced the probability of the issue since 58a8bd3e31aa3b5fd3d962fb5b03479fa0014ee9 decreased the maximum overread from 31 bytes to 15 bytes, but it’s not eliminated. The Valgrind trace I posted above is from 3.9.13.