gdext: Implicit binds in virtual calls cause borrow errors
Edit bromeon – title was “Cannot add_child in a &mut self function if Self overrides on_notification”
#[derive(GodotClass)]
#[class(init, base = Node)]
pub struct Bar {
#[base]
base: Base<Node>,
}
#[godot_api]
impl NodeVirtual for Bar {
fn ready(&mut self) {
if Engine::singleton().is_editor_hint() {
return;
}
let foo = Node::new_alloc();
self.add_child(foo.share().upcast());
}
fn on_notification(&mut self, what: NodeNotification) {}
}
add_child will trigger a notification (https://docs.godotengine.org/en/stable/classes/class_node.html#class-node-constant-notification-child-order-changed), and that will in turn trigger a call to on_notification, which will try to get a mutable borrow of self. This causes a crash.
It seems like this is a big issue tbh because it basically makes self.add_child almost unusable whenever on_notification is overridden.
I’m not sure what a good solution to this is, but it’s related to the common issue of calling a signal that will trigger on self.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 4
- Comments: 18 (18 by maintainers)
Allowing
Gd<Self>#359 as the receiver could be used to do a workaround for #383, though it’d be kinda awkward code imo:Also one important thing to note is that deferring is not really a workaround here: It explicitly changes behavior by introducing delays. For some applications that is unacceptable
I have an idea for how to mostly fix these issues.
Background
The basic issue is that we have an indirect call of a rust method through ffi, but this shouldn’t theoretically be an issue as you are allowed to do this in rust.
For instance, let’s say you have a signal
foo, which when emitted callsbar. Then this code:Is the same as doing
which is fine, but the original code would fail.
The reason this is fine is because rust knows that calling
bar()requires a mutable reference to self, i.e there must be no aliasing other references around.So if we can somehow create an API that allows a user to call methods which may potentially call back into self (such as
emitoradd_child) only when we could get a&mut selfreference. Then it would be safe to create a new&mut selfreference when we’re called from Godot later.Basic Idea
The idea has two major components (all names used are up for bikeshedding):
bind_self(), only when the object hasn’t been bound by eitherbind()orbind_mut(), it also blocks bothbind()andbind_mut(). [^1]bind_mut()if one may exist, andbind_self()when we know one cant.Current Idea For Implementation
My current idea for implementing this is:
We add a counter to
InstanceStoragethat i’ll callpossibly_aliasing: usize[^2] which tracks the number of possibly aliasing references that exist. This counter is set to 0 by default.possibly_aliasingwill be incremented by 1 whenever we enter a method from Godot, or wheneverbind()orbind_mut()are called on aGd<T>. As we can then get arbitrary aliasing references.possibly_aliasingis decremented by 1 whenever one of the bind-guards are dropped.Additionally we create a drop-guard which
Basecan return. This drop-guard decrementspossibly_aliasingby 1, and then increments it by 1 when it’s dropped. This drop-guard requires a&mut selfto be created, ensuring that we do not have any other aliasing references around when it’s called.[^3] This drop-guard dereferences to the engine-type similar to aGd<T>.When a rust-method is called from Godot (that takes
&mut self), we checkpossibly_aliasing. If it is 0, then we usebind_self(), otherwise we usebind_mut().So as some code-examples we’d take this:
And we generate this (very abbreviated) glue-code:
One thing to consider is adding an immutable/mutable pair of these, as it’d be safe to call a method that takes a
&selffrom one that takes a&mut self. However the other way is not safe. This could be done with a second counter that just tracks how many immutable self-binds we’ve taken. And if it’s greater than 0 then a mutable self bind will fail, but this should also allowbind()to succeed.old contents
i wonder if we can add another kind of unsafe bind, like `bind_self`. which returns a `GdSelf` that derefs to `&mut self`. where the safety invariant is that it is only used to call a method on it.it fails if there exists a
bindorbind_mut(and also blocks both of those calls), but not if there already exists abind_self. that way we can allow for reentrancy without borrow failures. and i dont think it’d be possible to get two aliasing mut pointers, since it cant exist alongside the other binds and i dont think you can ever observe twoGdSelfat the same time.this needs one modification, accessing
basemust require a&mut selfreference. we can do this by changing how base is accessed to a method on self. so we’d haveself.base().add_child()for instance. that way rust can understand that any usage of base could potentially require a mutable reference to self. this could be done by making dereferencing aBasean unsafe method and auto-generating the accessor method.This should avoid the unsoundness with allowing reentrancy @Bromeon identified above.
This is additionally also semantically more correct imo, since a
Baseis a pointer to self, so it makes sense that calling a method on it should require an appropriate reference to self.[^1]: I am unsure how user-facing
bind_self()should be. We may choose to only implement it onInstanceStoragethus making it entirely internal and hidden from the user.bind_self()would likely need to be unsafe also as it can return multiple&mut Treferences to the same object. [^2]: This will actually be aRefCell<usize>orAtomicUsizeas relevant. [^3]: We might be able to make this a safe function, if we check that the&mut selfpoints to the same rust object that theBaserefers to.upcast()is no longer needed; since #370 you can call Godot functions on theGd<T>directly.This is an interesting challenge, and probably far more general than the specific example in the issue title. Working with notifications or signals in general has the potential to “backfire”, leading to a panic.
Adding another data point: I encountered this issue while implementing a custom item list, which basically started as a port of item_list.cpp of Godot. The pattern used in that implementation doesn’t work out-of-the box in Rust, because it involves backfiring signals, even in a relatively subtle way:
"value_changed"signal gets connected to a callback that simply doesqueue_redraw. That makes sense, in general the item list wants to react to scroll events.NOTIFICATION_DRAW.set_maxandset_pagefunctions. What is not immediately obvious: Under certain conditions these setter actually internally delegate to theset_valuesetter, which in turn triggers the signal. In Rust this results in a panic, because the callback cannot be called while the drawing is happening. Interestingly there is aset_value_no_signalsetter, that allows to set the scrollbar value without generating a signal. However, there are noset_max_no_signalorset_page_no_signalvariations, which makes it tricky to realize that the code can backfire a signal. In fact it took me a while to encounter the bug in practice, because it requires the special condition of the scroll bar being scrolled all the way to the bottom, followed by resizing the window in a way that the height increases.queue_redrawis pointless in this case when the scrollbar changes its value triggered by the drawing logic itself, because the drawing is already happening. Triggering yet another redraw is actually a waste of resources. So while the bug was not so easy to find, it has actually revealed a code smell of the C++ version, and the resolution doesn’t even feel so bad.Some thoughts about more general solutions:
The idea in #359 looks quite helpful. For instance, instead of disconnecting and reconnecting the signal, the drawing logic could have set
self.is_within_redraw = true(and at the end back tofalse)But the manual maintenance of
self.is_within_redrawfeels a bit clumsy. This makes me wonder if it would be possible to offer a kind oftry_bind_mut()which basically knows whether we are already bound or not? This would allow to write:The interesting question is what to do in the
elsebranch in the general case. One possibility would be to use interior mutability to encode “what I would have liked to do, but couldn’t”. At the end of the other methods we could check whether such actions exist and execute them there. I’m not sure if it is possible/sensible to generalize this, but a crazy idea was to have a kind ofDeferredQueuethat uses interior mutability like this:Not sure if this makes sense, but I feel that offering a simple way to “defer” signals/notifications could help a lot to mitigate this issue, because it breaks the cyclic nature of the problem. In many use cases the slight delay may be a reasonable price to pay for being guaranteed not to run into “signal cycle panics”.