rust-autograd: segfault when calling grad() in a loop
On my x86 linux environment with rust 1.47.0 and autograd 1.0.2 the below code has a segmentation fault.
Any idea what’s causing this crash?
Given that this segfault happened in a very simple loop and that there’s a decent amount of code inside of unsafe{} would it possible/feasible to use conditional compilation that used alternatives to unsafe{} code blocks when they are purely for performance, along with alternative safe versions of datastructures (eg. UnsafeCell, not sure how trustworthy SmallVec is).
test code:
==> Cargo.toml <==
[package]
name = "autograd_test"
version = "0.1.0"
edition = "2018"
[dependencies]
autograd = { version = "1.0.2" }
==> src/main.rs <==
extern crate autograd as ag;
fn main() {
ag::with(|g: &mut ag::Graph<f64>| {
let mut loop_iter = 1;
loop {
let x = g.placeholder(&[3]);
let z = 2.0 * x;
eprintln!("about to call grad() {}", loop_iter);
g.grad(&[z], &[x])[0];
eprintln!("grad() call completed {}", loop_iter);
loop_iter += 1;
if loop_iter >= 1000 { break };
};
});
}
The gdb output and backtrace for me looks like this (this crashes in the --release mode as well for me):
about to call grad() 1
grad() call completed 1
about to call grad() 2
grad() call completed 2
...
grad() call completed 25
about to call grad() 26
grad() call completed 26
about to call grad() 27
Program received signal SIGSEGV, Segmentation fault.
smallvec::SmallVec<A>::spilled (self=0x7ffff7fd12c0)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/smallvec-1.4.2/src/lib.rs:695
695 self.capacity > Self::inline_capacity()
(gdb) bt
#0 smallvec::SmallVec<A>::spilled (self=0x7ffff7fd12c0)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/smallvec-1.4.2/src/lib.rs:695
#1 0x0000555555585ab3 in smallvec::SmallVec<A>::triple (self=0x7ffff7fd12c0)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/smallvec-1.4.2/src/lib.rs:666
#2 0x0000555555584bce in smallvec::SmallVec<A>::len (self=0x7ffff7fd12c0)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/smallvec-1.4.2/src/lib.rs:646
#3 0x00005555555d3e1a in autograd::gradient::symbolic_gradients (ys=..., wrt=..., gys=..., g=0x7fffffffda60)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/autograd-1.0.2/src/gradient.rs:166
#4 0x00005555555c5573 in autograd::ops::<impl autograd::graph::Graph<F>>::grad_with_default (
self=0x7fffffffda60, ys=..., xs=..., ys_grads=...)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/autograd-1.0.2/src/ops/mod.rs:128
#5 0x00005555555c5cbb in autograd::ops::<impl autograd::graph::Graph<F>>::grad (self=0x7fffffffda60, ys_=...,
xs=...) at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/autograd-1.0.2/src/ops/mod.rs:96
#6 0x000055555559a42e in autograd_test::main::{{closure}} (g=0x7fffffffda60) at src/main.rs:10
#7 0x00005555555c6906 in autograd::graph::with (f=...)
at /home/ktegan/.cargo/registry/src/github.com-1ecc6299db9ec823/autograd-1.0.2/src/graph.rs:91
#8 0x00005555555bf546 in autograd_test::main () at src/main.rs:4
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 19 (11 by maintainers)
@ktegan Oh good to hear that.
Ok. Unfortunately, the current state of
Graph
API is not fully in accordance with Rust’s (variable) reference system and managing&TensorInternal
values manually to provide flexible tensor interfaces. (gateway to illegal things is here)Specifically, those immutable references were used everywhere in gradient.rs , but those were invalidated by relocations of the TensorInternal’s factory; those relocations easily happens in the symbolic gradient computation (Op::grad invocations)…
@BenCrulis Thank you! I’ll submit a patch release fixing a series of memory problems later.
I’m working on it this weekend!
I also have another SIGSEGV, still on the mem-hotfix branch with the following code:
GDB:
What is weird is that the line
let s=...
doesn’t even participate to the computation.In my original test program setting the dependencies to use the git repository and the mem-hotfix branch I still see the same segmentation fault error. Setting RUST_BACKTRACE doesn’t help in this case because this is a segfault, not a rust generated panic. When I run a debug executable with mem-hotfix inside of gdb I now see this backtrace:
Ah, I 'm not sure, but now I noticed one more little that needs to be fixed… (will do it tomorrow)
@raskr Unfortunately, it still crashes, here is the stack trace:
I should add that the test_many_nodes testcase in test_core.rs is also crashing using your mem-hotfix branch:
signal: 11, SIGSEGV: invalid memory reference
I confirm that the code above panics for me, here is a stack trace with RUST_BACKTRACE=full:
uname:
Linux 4.15.0-123-generic #126-Ubuntu SMP Wed Oct 21 09:40:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
I also just noticed that removing the mkl feature makes the same code SIGSEGV.
Let me know if I can provide more useful info. I hope that’s not a problem outside of the library.
I am getting a similar bad behavior with this code on linux with rustc v1.47:
Cargo.toml:
As it is written, I am getting a panic with message ‘Not differentiable with given tensor(s).’, which I guess is caused by g.slice(), even though in this particular example I am retrieving only one cell of the Tensor. Note that I am trying to get multiple cells of the tensor in my real use case. Then, uncommenting any of the breaks makes it work but multiplying ‘l’ by 1.0 causes a SIGSEGV.
I am very impressed with the library so far, I hope that I can get it to work correctly, unless of course this kind of code pattern isn’t supposed to be supported.