simdjson: moving parser instances invalidates elements, use std::unique_ptr instead

Describe the bug

  • It appears the simdjson::dom::parser and simdjson::dom::element don’t like to be moved around or put into certain std containers. They crash if put into any container but a std::list. Likely they don’t like to be copied/moved. I get a SIGSEGV when putting them into a std::vector, for example.

  • It also appears that if I keep simdjson::dom:element around for longer than the parser that last manipulated it, serializing via simdjson::to_string also fails later with a SIGSEGV. (Update: Ok, so this is an API limitation – fair enough – but then one should allow dom::parsers to be copied and put into vectors if they must be married to their dom::element!!).

To Reproduce

See this sample project that illustrates the problem(s):

https://github.com/cculianu/simdjson-bugs

In particular this code here: https://github.com/cculianu/simdjson-bugs/blob/09e0d3ded2bf51296a9c29d583ba7ba1aac4ae04/main.cpp#L59

    std::vector<std::tuple<std::string, simdjson::dom::parser, simdjson::dom::element>> results;
    for (const auto & f : files) {
        std::string buf = slurpFile(f); // reads file, returns entire file contents as a string
        std::cout << "Read \"" << f << "\" - size: " << buf.size() << "\n";
        simdjson::dom::parser parser;
        simdjson::dom::element parsed;
        auto error = parser.parse(buf).get(parsed);
        if (error) throw std::runtime_error(std::string{"Failed to parse "} + std::string{f});
        results.emplace_back(std::string{f}, std::move(parser), std::move(parsed));
    }
    for (const auto & [name, parser, parsed] : results) {
        std::cout << "File: \"" << name << "\"; serializing...\n";
/// ---- CRASH ON BELOW LINE ----
        std::cout << "First 100 bytes, reserialized: " << simdjson::to_string(parsed).substr(0, 100) << " ...\n";
        std::cout << std::string(80, '-') << "\n";
    }
}

The crash is in the second loop – the one that calls to_string. The crash happens when it minifies the dom to a string. However, if we use std::list instead of std::vector for the container above (which involves no moving-around of elements on emplace_back), it works perfectly.

Here is another case where it crashes: UPDATE: It appears the below is expected behavior. 😦

    std::list<std::pair<std::string, simdjson::dom::element>> results;
    for (const auto & f : files) {
        std::string buf = slurpFile(f); // reads file, returns entire file contents as a string
        std::cout << "Read \"" << f << "\" - size: " << buf.size() << "\n";
        results.emplace_back();
        auto & [name, parsed] = results.back();
        name = f; // assign
        simdjson::dom::parser parser; // <--  it appears simdjson doesn't like this object to die before the element?!
        auto error = parser.parse(buf).get(parsed);
        if (error) throw std::runtime_error(std::string{"Failed to parse "} + std::string{f});
    }
    // at this point the parser used for each result is dead.
    for (const auto & [name, parsed] : results) {
        std::cout << "File: \"" << name << "\"; serializing...\n";
/// ---- CRASH ON BELOW LINE ----
        std::cout << "First 100 bytes, reserialized: " << simdjson::to_string(parsed).substr(0, 100) << " ...\n";
        std::cout << std::string(80, '-') << "\n";
    }

Configuration (please complete the following information if relevant):

  • OS: Debian 10 and also macOS Mojave
  • Compiler Apple clang version 11.0.0 and gcc 8.3.0
  • Version 0.6.0
  • Processor Intel Core i9
  • Using the single file header implementation (I didn’t even try the compiled non-single-file header)

Suggestion

  • If these operations are not permitted (such as moving or copying) – then you should likely make the unsafe operations forbidden by the C++ type system.
  • If there are lifecycle constraints e.g. the parser must outlive element – then this needs to be enforced by the object model.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 32 (25 by maintainers)

Most upvoted comments

We really do want to remove the “sharp edges” from the Api as well. People shouldn’t have to choose between fast and easy. Otherwise most people reach for easy, servers are slow and suddenly the polar ice caps are melting.

So your experience is instructive, even if you have moved past it. Thanks!

@cculianu I misread your issue (it seems @jkeiser did too).

I should have asked more questions. Sorry.