ndk: Runtime problems when including multiple versions of ndk-glue

Having multiple versions of ndk-glue in the dependency tree leads to crashes because of uninitialized NATIVE_ACTIVITY.

Example: Including ndk-glue 0.5. When entering main(), NATIVE_ACTIVITY 0.5 gets initialized. Including Oboe crate that uses ndk-glue 0.4.0. When enumerating audio devices, i get:

thread 'tokio-runtime-worker' panicked at 'called `Option::unwrap()` on a `None` value', /Users/ric/.cargo/registry/src/github.com-1ecc6299db9ec823/ndk-glue-0.4.0/src/lib.rs:59:39

I tried using [patch.crates-io] but that does not seem to work.

This bug might be related to using resolver = "2" as required by wgpu.

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 28 (26 by maintainers)

Commits related to this issue

ndk-glue: Initialize ndk-context on 0.5 stable releases In order to show that [#223] solves our problem ([#211]) with multiple ndk-glue versions in tree, all having their own `static` globals, backpo... — committed to rust-mobile/ndk by MarijnS95 2 years ago
ndk-glue: Initialize ndk-context on 0.5 stable releases (#225) In order to show that [#223] solves our problem ([#211]) with multiple ndk-glue versions in tree, all having their own `static` globals... — committed to rust-mobile/ndk by MarijnS95 2 years ago
android: Migrate from `ndk-glue` to `ndk-context` `ndk-glue` suffers one fatal flaw: it's "only" supposed to be used by the crate providing `fn main()` and only supposed to end up in the dependency g... — committed to MarijnS95/app_dirs2 by MarijnS95 2 years ago
ndk-glue: Initialize ndk-context on 0.4 stable releases In order to show that [#223] solves our problem ([#211]) with multiple ndk-glue versions in tree, all having their own `static` globals, backpo... — committed to rust-mobile/ndk by MarijnS95 2 years ago
android: Migrate from `ndk-glue` to `ndk-context` `ndk-glue` suffers one fatal flaw: it's "only" supposed to be used by the crate providing `fn main()` and only supposed to end up in the dependency g... — committed to MarijnS95/cpal by MarijnS95 2 years ago
android: Migrate from `ndk-glue` to `ndk-context` `ndk-glue` suffers one fatal flaw: it's "only" supposed to be used by the crate providing `fn main()` and only supposed to end up in the dependency g... — committed to MarijnS95/cpal by MarijnS95 2 years ago
android: Migrate from `ndk-glue` to `ndk-context` `ndk-glue` suffers one fatal flaw: it's "only" supposed to be used by the crate providing `fn main()` and only supposed to end up in the dependency g... — committed to MarijnS95/cpal by MarijnS95 2 years ago
ndk-glue: Initialize ndk-context on 0.4 stable releases (#226) In order to show that [#223] solves our problem ([#211]) with multiple ndk-glue versions in tree, all having their own `static` globals... — committed to rust-mobile/ndk by MarijnS95 2 years ago

Most upvoted comments

Opened a PR #223

dvc94ch on Feb 7, 2022

Between @MarijnS95 and @dvc94ch ideas I prefer the latter. While @MarijnS95 idea is cleaner, @dvc94ch idea is the more practical IMO. Just one nit, I think “ndk-context” is clearer than “ndk-ctx”. The ndk-context pointers can be RwLock<Option<NonNull<c_void>>>, and the ndk crate can use some constructors like from_context() that return RwLock<Type>. But the raw pointers could also also be used unsafely directly from ndk-context (this would be the case for openxrs). Do you need me to make a PR?

zarik5 on Jan 25, 2022

I’m currently leaning towards the separate ptr crate approach as it seems to minimize the risk of breaking changes compared to just using a version range for ndk.

Regarding the ANativeActivity::instance I’m not totally sure if I understand it in detail. In particular who is owning the memory storing the pointers? The ANativeActivity::instance is then only for internal usage?

msiglreith on Jan 17, 2022

but your last release made it already deprecated. openxrs also breaks if I update ndk-glue to 0.6 in my project. This is not sustainable.

Indeed, and increased use of ndk-glue lately shows.

The ndk crate dependency should be removed from ndk-glue

Not easily possible since ndk-glue has to know of and call/use some of the functions/types from the ndk. We have better solutions.

and maybe move some functionality inside the ndk crate.

Nah, I think the ndk crate should purely hold the (safe) abstractions to the NDK. ndk-glue is “just” a helper (like Android’s “official” glue layer) that users can leverage to simplify their Android setup. Either way it’ll suffer from breaking changes to the ndk there too… Pick your poison.

Maybe we need a ndk-ptrs crate for storing the static pointers. The crate should have no dependencies (apart for lazy_static), and the pointers can be untyped (c_void).

I’ve been thinking about the exact same, we can even apply this retroactively to previous crate-releases as an additional patch. It’ll just need a dependency on one of the crates that provides a Mutex/RWLock with a mapable inner value on the *LockGuard. That way we can convert from the pointer to the typed handle, and give the user what they asked for while held inside the *LockGuard.

@msiglreith @francesca64 @dvc94ch What are your thoughts on this? I/we can start working on a little proof of concept for this, having to make sure that we don’t need to release any breaking changes to it for a good long while.

Alternatively we could do away with statics entirely and work with a heap allocation to which a pointer is stored in ANativeActivity::instance. Users would then have to pass this object around to get access to the NativeActivity and NativeWindow, and code would simply not compile if differing versions of ndk/ndk-glue are combined. Safer and more predictable, but the same dependency management hell for all the library crates involved.

MarijnS95 on Jan 17, 2022

think we can close now

dvc94ch on Dec 6, 2022

I think we should close this issue as soon as most popular crates migrated to ndk-context:

https://crates.io/crates/ndk-glue/reverse_dependencies

https://github.com/app-dirs-rs/app_dirs2/pull/19
https://github.com/Ralith/openxrs/pull/108
https://github.com/RustAudio/cpal/pull/641
https://github.com/katyo/oboe-rs/pull/42
webbrowser
winit (needs more than just the VM/Context, https://github.com/MarijnS95/android-ndk-rs/compare/anativeactivity-instance?)

@zarik5 When making these PRs at least link back to the original issue and PR 😃

MarijnS95 on Feb 16, 2022

ndk-glue 0.6.1 and ndk-context 0.1.0 are released.

dvc94ch on Feb 14, 2022

Hence going back to my comment and your suggestion, also taking Kotlin into account: those crates should consume the necessary objects through an init function or constructor. We could pass it the needed JNI or NDK types, and get rid of static state and public getters altogether.

ndk-ctx is likely still useful. The reason being that if a cross platform crate wants to add android support they’ll likely not be very happy with having to pass around the android context, as evidence by the use of ndk-glue. A simple solution is likely going to have to be provided that doesn’t force users to have to add new_with_extra_android_context to all their apis. This recursively infects everything, so the results are not pretty and could lead to substandard support for android through the eco system. I think my proposal is likely the best way forward to provide the convenience required to see a larger number of crates add some android logic.

dvc94ch on Jan 19, 2022

@msiglreith

I’m currently leaning towards the separate ptr crate approach as it seems to minimize the risk of breaking changes compared to just using a version range for ndk.

We can do both, the dependency range should result in less minor (breaking) updates for at least ndk-glue because we’re not forcing the update of another breaking-change crate dependency anymore. But then again we’d have to validate how cargo selects the dependency and also make sure at least min and max versions are tested in the CI? If we do this for multiple crates possible combinations may explode exponentially though.

Regarding the ANativeActivity::instance I’m not totally sure if I understand it in detail. In particular who is owning the memory storing the pointers? The ANativeActivity::instance is then only for internal usage?

It seems to be the typical userdata pointer that is untouched by the NDK, you’re free to assign any data you want but of course have to drop it when the ANativeActivity is cleaned up, presumably in onDestroy.

@dvc94ch Thanks for those insights! It indeed seems that many crates are erroneously using ndk-glue to get hold of the ANativeActivity for JNI purposes (or in one of my test-crates: the internal_data_path()).

Hence going back to my comment and your suggestion, also taking Kotlin into account: those crates should consume the necessary objects through an init function or constructor. We could pass it the needed JNI or NDK types, and get rid of static state and public getters altogether.

On the ndk-glue side we will then pass an ANativeActivity into fn main() from fn init(), and internally hold a pointer to our own state in instance so that there’s no static globals anymore. We’ll still have to wrap things in locks to be able to access them concurrently from ANativeActivityCallbacks.

If there’s an NDK version mismatch things will simply refuse to compile (instead of the hard-to-debug failures we’re observing right now), and we can use the existing ptr/unsafe from_ptr helpers to turn one into the other (leveraging the more stable ndk-sys crate types, and if all else a raw pointer can easily be cast too).

MarijnS95 on Jan 19, 2022

so how about a crate that looks like this:

pub struct AndroidContext {
    jni_env: std::ffi::c_void,
   context: std::ffi::c_void,
}
pub fn android_context() -> AndroidContext;

#[doc(hidden)]
pub unsafe intialize_android_context(ctx: AndroidContext);

ndk-glue can initialize it and all library crates can get the stuff they need from the ndk-context crate? If you spawn a new thread you can use the get_java_vm method and then attach to the new thread. this would mean that downstream crates don’t depend on ndk-glue (which they shouldn’t).

dvc94ch on Jan 17, 2022

@MarijnS95 In the end, is the best approach is to set ndk = "*" and ndk-glue = "*" in libraries that use android-ndk? It’s such a pain to bump every library to use the newest version of ndk-glue.

@enfipy As @repi mentioned such crates cannot be published… But it should be possible to publish them with a range of compatible crate revisions. Perhaps ndk-glue could set something (imaginary, I didn’t compatibility-test this) like ndk = ">= 0.3, <= 0.6". Then we only need to bump these (usually only the upper bound) in a patch release. (We can probably do this regardless of introducing an ndk-ptrs/statics crate mentioned above?)

Note that an upper bound is mandatory in my eyes to prevent a situation like https://github.com/gwihlidal/vk-mem-rs/issues/54. It’s much safer for us to validate a setup and push a patch release enlarging the available range of versions than to retroactively publish a new patch release with the upper bound fixed to the last working/compatible version, and yanking everything without that upper bound from crates.io.

The only thing I am not sure about is crate selection. I assume, but have not tested this, that if a crate were to select ndk = "0.5" any crate with ndk-glue with the imaginary ndk = ">= 0.3, <= 0.6" dependency from above would get just ndk 0.5.0? Though perhaps if cargo were to try and select the highest or the lowest (-Zminimal-versions) you could end up with duplicates on different versions after all. Then only Embark’s cargo-deny could save you (or the aforementioned suggestion to remove statics from the crate altogether).

MarijnS95 on Jan 17, 2022

In the end, is the best approach is to set ndk = “" and ndk-glue = "” in libraries that use android-ndk? It’s such a pain to bump every library to use the newest version of ndk-glue.

Don’t think you can publish a crate with wildcard version? Believe it only works for local crates

repi on Jan 14, 2022