glaze: heap-buffer-overflow in serialize_string
I’m using ASAN and have stumbled on a heap-buffer-overflow when calling write_json.
Here is a minimal code example:
struct TestStruct {
std::string val1 {};
};
template<>
struct glz::meta<TestStruct> {
using T = TestStruct;
static constexpr std::string_view name = "TestStruct";
// clang-format off
static constexpr auto value = object(
"val1", &T::val1
);
// clang-format on
};
std::string jsonString = R"({"val1":"1234567890123456"})"; // 16 character string value
auto test_result = glz::read_json<TestStruct>(jsonString);
if (test_result) {
auto parsed = glz::write_json(test_result.value()); // ASAN exception occurs here
}
It succeeds for shorter strings, but once you hit a length of 16 the problems start. I haven’t had a chance to really dig into glaze’s code, and perhaps some of it is just runtime dependent, but I also saw it work fine with string lengths of 23, 30, 31 characters.
The exception occurs at the following line:
std::memcpy(&swar, in, Bytes);
For the moment I rolled back to an older working version. Let me know if you need more info or if I can be of any help in fixing this. I know you’ve done a lot of work in this library for performance, so I assume there are reasons for skipping certain safety checks (and inherent danger thereof) in the buffer manipulation that is happening in the serialize_string function.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 19 (11 by maintainers)
@emusgrave and @CramBL
It is looking like this is a bug in GCC’s address santizer. The more recent string serialization/parsing in Glaze utilizes SWAR (SIMD Within A Register). This allows us to look at and copy eight bytes at a time rather than just one.
To achieve this, we need to ensure that the string we are reading/copying has a capacity that is a multiple of 8. The code looks like:
We never actually use these bytes for parsing/serializing. Instead, we just make sure they are allocated so that we can work on 8 byte chunks. When this reservation does not exist, Clang appropriately gives an address sanitizer error. But, the Clang address sanitizer is satisfied when we appropriately make this allocation. The standard library requires that calling reserve allocates the new capacity or greater: cppreference string.reserve().
So, there is nothing illegal or dangerous with what Glaze is doing. The address sanitizer should be satisfied because we are not reading beyond the string’s allocated memory. However, GCC complains. It probably uses the size of the string to determine the boundary limits on the address sanitizer, rather than the capacity.
I’m going to continue doing more testing to make sure my findings are true.
Interestingly, I can’t get @emusgrave issue to fail the address sanitizer on compiler explorer (which I’m pretty sure is using linux based servers). Here’s a link to a simple example
Thanks for the additional info. This was just fixed with a recent string reading address sanitizer fix (#773). I’m going to be making address sanitizer checks a part of the unit testing and GitHub Actions. So, hopefully we can avoid these in the future.
@emusgrave and @CramBL, I’ve merged in #759, which should fix address sanitizers complaining about writing strings. Let me know if you run into any more address sanitizer problems. I’ll be releasing this soon in a version bump.
@CramBL, thanks for the additional information. I will prioritize fixing this. It’s very helpful to know that you’re seeing the issue on linux, as I should be able to locate it and debug it now.
Thanks for sharing this issue. The string writing was recently optimized, so I can understand that certain conditions may have been overlooked.
I intend Glaze to pass address sanitizers and not use any undefined behavior, so this will definitely be a priority to fix. However, I’m going to be out of pocket for a few weeks, so it will be probably a month before I can solve this.