onnxruntime: `static inline Ort::Env onnx_env{nullptr}` easily leads to nullptr deref on app exit

Creating a null Ort::Env without later moving in an initialized Env + defining ORT_API_MANUAL_INIT can easily crash on app exit due to not checking for the null Env or null Ort::Global<void>::api_

Setup

  • Microsoft Windows [Version 10.0.19044.1889]
  • MSVS 2019 community v16.11.18
  • Onnx v1.12.1

Repro

Pseudocode

class MyOnnx {
  static inline Ort::Env onnx_env{nullptr}; 
  ...
};

main() {
  // do nothing
}
  1. build app with pseudocode above and defining ORT_API_MANUAL_INIT
  2. run app

Result

App crashes on exit due to nullptr derefence. Debugger will report access violation and Ort::Global<void>::api_ was nullptr

Expected

No crash. Silent exit.

Notes

I see the cause in code. struct Env : Base<OrtEnv> and that ~Base() { OrtRelease(p_); } blindly calls OrtRelease() without checking that a release isn’t necessary since _p = nullptr. OrtRelease for that class is ORT_DEFINE_RELEASE(Env); which expands with OrtRelease(Ort##NAME* ptr) { GetApi().Release##NAME(ptr); } which calls GetApi() and blindly uses its pointer without checking for nullptr which leads to the null dereference crash.

I would prefer Onnx to check for nullptr before using a pointer. It could be done in the Base class or in the macro. I slightly prefer it in Base so it doesn’t do unnecessary work of calling getapi(), etc. Putting it in the macro would also solve the crash but still runs unnecessary code.

Related https://github.com/microsoft/onnxruntime/issues/2297

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 19 (16 by maintainers)

Commits related to this issue

Most upvoted comments

I’m facing the same problem. When define Ort::Env as a global variable before main, the crashing happens resulted from NULLPTR with address 0x0000000000. This is weird. At least, this issue should be pointed out in the document instead of obscurely wrote in the example code. Quite annoying.

delete nullptr; is always valid. I haven’t seen a case that it would crash.

False. And to repeat again as I wrote, it is undefined by C++ spec and the implementer of the deleting function is responsible for handling. Please google this topic for detailed C++ spec and discussions on this.

STL and default deleters handle nullptr being passed to the deleters. But custom ones like I discussed DO NOT.

onnx c++/c headers need to check for nullptr. Use if (theptr) release(theptr) instead of today’s blind release(theptr). I discuss this in my OP.

So, even you know GetApi() is null, what you can do except crashing the process? Don’t crash. Test and release only if it is non-null. This is SOP for even the most core functions like free()

As we know, the env is singular…only one. I forget but when I isolated this bug, I think deep down the env is a singleton or something similar. When classes/struct are defined in the app, having a single env as a static member of a class, singleton, etc. is a common approach. Naturally, you know about the static initialization fiasco…and its partner the static deinitialization fiasco. The orders of init and deinit are not guaranteed across translation units and in general you CAN NOT base rules no things like main().

Who writes code with main()? Today we write DLLs and event loops and more complex applications. Making a rule based on main() is doomed to fail in anything but the most basic applications. 😁 and in the very link you provide me is a //TODO: Fix the C API issue and //If we don't do this, it will crash which identifies that…yes…the API is currently broken. The solutions I provide above can address many of them.

Test for non-null…then release. Don’t do it blindly.