wasm-c-api: C++ opaque handle implementation is based on undefined behaviour

basic.lval/11:

If a program attempts to access the stored value of an object through a glvalue whose type is not similar ([conv.qual]) to one of the following types the behavior is undefined:52

  • the dynamic type of the object,
  • a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
  • a char, unsigned char, or std​::​byte type.

The implementation of opaque handles dynamically allocates an object of one type and casts it to an unrelated type. Accessing the object through a pointer to one of the API types (e.g. in order to call copy or kind) is then undefined behaviour.

Is there documentation somewhere which outlines the use of this strategy as opposed to inheritence- or composition-based ones which don’t break this rule? I realise that it may work on compilers which this has been tested with, but it seems somewhat risky to mandate UB through the interface, as implementations may change, or this may trigger sanitizer fails.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 8
  • Comments: 17 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, I am UBsan, and I’m currently complaining about this.

There’s a straight-forward fix:

  • make the constructors defaulted and protected
  • make the destructor protected
  • remove the overloaded operator delete
  • add a private destroy() method
  • change own<T> from unique_ptr<T> to unique_ptr<T, Deleter> where Deleter calls destroy()
  • make the classes friends with their Deleters so that unique_ptr can delete them but Foo::make()->destroy() doesn’t compile (if it did, it would lead to a double-free).

Then implementers can:

  • create their own implementation classes that public-derive from these classes and add their own additional member variables and methods
  • define the declared make() method to always create objects of the matching derived type.
  • define the other methods by static_cast’ing the (this) pointer to the derived type which is always valid since the only way to create one is with the make() method that always creates the derived type anyways
  • like the other methods destroy() casts to derived type, then calls “delete casted_this;” or calls a derived destroy() which does “delete this;”

Example: wasm.hh has:

template <typename T>
struct Destroyer {
  void operator()(T *ptr) {
    ptr->destroy();
  }
};

template<class T> using own = std::unique_ptr<T, Destroyer<T>>;

template<class T>
auto make_own(T* x) -> own<T> { return own<T>(x); }

class WASM_API_EXTERN Config {
  friend struct Destroyer<Config>;
  void destroy();

protected:
  Config() = default;
  ~Config();

public:
  static auto make() -> own<Config>;

  // Implementations may provide custom methods for manipulating Configs.
};

and impl-wasm.cc has:

class ImplConfig : public Config {
public:
  ImplConfig() = default;
  ~ImplConfig() {}

  static ImplConfig *from(Config *config) {
    return static_cast<ImplConfig *>(config);
  }

private:
  int impl_state;
};

Config::~Config() {}

auto Config::make() -> own<Config> { return make_own<Config>(new ImplConfig); }

void Config::destroy() { delete ImplConfig::from(this); }

A few comments on this approach:

  1. A unique_ptr<Derived, Deleter<Derived>> doesn’t freely cast to a unique_ptr<Base, Deleter<Base>>. I think the easiest fix it to pass the explicit T to make_own when doing a make + downcast combined as shown in my definition of make() above.

  2. With this approach the potential bug of deleting the object of base type is prevented:

void test(Config *config) {
  delete config;
}

dt.cc:52:10: error: calling a protected destructor of class 'Config'
  delete config;
         ^
dt.cc:20:3: note: declared protected here
  ~Config();
  ^

which is necessary if we expect the implementer to extend the size of the object when deriving.

  1. Manual memory management is still possible but obvious in code review.
void ok1() {
  auto x = Config::make();
}

void ok2() {
  auto x = Config::make().release();
  Destroyer<Config>()(x);
}

void error1() {
  Config::make()->destroy();
}

dt.cc:63:19: error: 'destroy' is a private member of 'Config'
  Config::make()->destroy();
                  ^
dt.cc:19:8: note: implicitly declared private here
  void destroy();
       ^

The other main alternative is to use virtual. I assume we’re avoiding that?

I’ve mostly implemented it in the draft PR #161. There’s a pre-existing bug with Shared<> that I’m not sure what approach to take to address and am looking for input. wasm-v8.cc defines an explicit template specialization for Shared<Module>::~Shared(), which is a problem because no specialization was declared and the destructor (or now destroy() in my approach) may be implicitly instantiated. I know this aspect of C++ templates is not well understood so I’ll give a primer:

  1. implicit instantiation. The template is defined generally over some template parameters and you use that pattern. This is what happens when you use std::vector<char> for instance. For implicit instantiations, multiple definitions is not a linker error and it is optional whether the compiler chooses to emit the the symbols to the .o file (for example, it may depend on the optimization level whether a function template is inlined into all callers so now there is no need for a symbol – this is what leads to most cases of mysterious C++ template linking errors).
  2. explicit template instantiation definition. Emits a definition to the .o file for all the symbols in this template. Two definitions may or may not be a linker error, but the definition must be present. For example (template Shared<int>;).
  3. explicit template instantiation declaration. Added in C++11. Informs the compiler that the template is expected to be provided by another .o file, in conjunction with clang’s -Wundefined-var-template and -Wundefined-func-template to prevent those mysterious linking errors. (extern template Shared<int>;).
  4. explicit template specialization definition. Replace the body of a template pattern (a whole class or a function or variable) with a different one for a particular set of template arguments. (template<> struct Shared<int> { /* ... */ };)
  5. explicit template specialization declaration. Declare that there exists an explicit specialization for this particular set of template arguments. (template<> struct Shared<int>;)

(not discussed: partial templates or template templates)

It’s incorrect to call an explicit specialization when expecting an instantiation per https://eel.is/c++draft/temp.expl.spec#7.sentence-1 . So this combination: wasm.hh:

class WASM_API_EXTERN Shared {
public:
  Shared() = delete;
  ~Shared();
  void operator delete(void*);
};

wasm-v8.cc:

template<>
Shared<Module>::~Shared() {
  stats.free(Stats::MODULE, this, Stats::SHARED);
  impl(this)->~vec();
}

is a non-starter. (If it helps, you can imagine that an explicit specialization has a different type signature, so it could be mangled differently, though both popular ABIs chose to mangle them the same which is why it still links.) Any code that might trigger implicit instantiation of the declaration of d’tor/destroy() must know that this is defined with an explicit specialization.

How would we do that? We could forbid C++ implementations from using an explicit specialization. We could tell users that it’s normal to have to include a second vendor-specific header file even though the entire interface defined in wasm.hh is portable at the source level. We could declare all the explicit specializations in wasm.hh and require all implementations to define for each specialization? We could replace Shared<T> with a non-template?