simdjson: moving parser instances invalidates elements, use std::unique_ptr instead
Describe the bug
-
It appears the
simdjson::dom::parserandsimdjson::dom::elementdon’t like to be moved around or put into certainstdcontainers. They crash if put into any container but astd::list. Likely they don’t like to be copied/moved. I get aSIGSEGVwhen putting them into astd::vector, for example. -
It also appears that if I keep(Update: Ok, so this is an API limitation – fair enough – but then one should allowsimdjson::dom:elementaround for longer than theparserthat last manipulated it, serializing viasimdjson::to_stringalso fails later with aSIGSEGV.dom::parsers to be copied and put into vectors if they must be married to theirdom::element!!).
To Reproduce
See this sample project that illustrates the problem(s):
https://github.com/cculianu/simdjson-bugs
In particular this code here: https://github.com/cculianu/simdjson-bugs/blob/09e0d3ded2bf51296a9c29d583ba7ba1aac4ae04/main.cpp#L59
std::vector<std::tuple<std::string, simdjson::dom::parser, simdjson::dom::element>> results;
for (const auto & f : files) {
std::string buf = slurpFile(f); // reads file, returns entire file contents as a string
std::cout << "Read \"" << f << "\" - size: " << buf.size() << "\n";
simdjson::dom::parser parser;
simdjson::dom::element parsed;
auto error = parser.parse(buf).get(parsed);
if (error) throw std::runtime_error(std::string{"Failed to parse "} + std::string{f});
results.emplace_back(std::string{f}, std::move(parser), std::move(parsed));
}
for (const auto & [name, parser, parsed] : results) {
std::cout << "File: \"" << name << "\"; serializing...\n";
/// ---- CRASH ON BELOW LINE ----
std::cout << "First 100 bytes, reserialized: " << simdjson::to_string(parsed).substr(0, 100) << " ...\n";
std::cout << std::string(80, '-') << "\n";
}
}
The crash is in the second loop – the one that calls to_string. The crash happens when it minifies the dom to a string. However, if we use std::list instead of std::vector for the container above (which involves no moving-around of elements on emplace_back), it works perfectly.
Here is another case where it crashes: UPDATE: It appears the below is expected behavior. 😦
std::list<std::pair<std::string, simdjson::dom::element>> results;
for (const auto & f : files) {
std::string buf = slurpFile(f); // reads file, returns entire file contents as a string
std::cout << "Read \"" << f << "\" - size: " << buf.size() << "\n";
results.emplace_back();
auto & [name, parsed] = results.back();
name = f; // assign
simdjson::dom::parser parser; // <-- it appears simdjson doesn't like this object to die before the element?!
auto error = parser.parse(buf).get(parsed);
if (error) throw std::runtime_error(std::string{"Failed to parse "} + std::string{f});
}
// at this point the parser used for each result is dead.
for (const auto & [name, parsed] : results) {
std::cout << "File: \"" << name << "\"; serializing...\n";
/// ---- CRASH ON BELOW LINE ----
std::cout << "First 100 bytes, reserialized: " << simdjson::to_string(parsed).substr(0, 100) << " ...\n";
std::cout << std::string(80, '-') << "\n";
}
Configuration (please complete the following information if relevant):
- OS: Debian 10 and also macOS Mojave
- Compiler Apple clang version 11.0.0 and gcc 8.3.0
- Version 0.6.0
- Processor Intel Core i9
- Using the single file header implementation (I didn’t even try the compiled non-single-file header)
Suggestion
- If these operations are not permitted (such as moving or copying) – then you should likely make the unsafe operations forbidden by the C++ type system.
- If there are lifecycle constraints e.g. the
parsermust outliveelement– then this needs to be enforced by the object model.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 32 (25 by maintainers)
We really do want to remove the “sharp edges” from the Api as well. People shouldn’t have to choose between fast and easy. Otherwise most people reach for easy, servers are slow and suddenly the polar ice caps are melting.
So your experience is instructive, even if you have moved past it. Thanks!
@cculianu I misread your issue (it seems @jkeiser did too).
I should have asked more questions. Sorry.