tokio: Signals do not add SA_ONSTACK which may cause linked Go applications to crash.

Version tokio-util v0.6.3 () tokio v1.2.0 ()

Platform Linux pyrite 5.10.12-1-default #1 SMP Sat Jan 30 19:15:49 UTC 2021 (a3c8888) x86_64 x86_64 x86_64 GNU/Linux

Description

When a module built with the tokio runtime is linked or used by an application with go, the related go module crashes. In this situation, the cause is that the nss_kanidm.so module which is built with tokio 1.2.0 is loaded via nsswitch, and this interfers with the signal handling of the docker daemon causing it to crash.

Feb 13 16:29:57 pyrite dockerd[1242]: fatal error: non-Go code set up signal handler without SA_ONSTACK flag
Feb 13 16:29:57 pyrite dockerd[1242]: runtime stack:
Feb 13 16:29:57 pyrite dockerd[1242]: runtime: unexpected return pc for runtime.sigtramp called from 0x7feba8889553
Feb 13 16:29:57 pyrite dockerd[1242]: stack: frame={sp:0xc000b14348, fp:0xc000b143a0} stack=[0xc000b0c298,0xc000b14698)
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14248:  000000c000b14250  000055df6ca73760 <runtime.throw.func1+0>
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14258:  000055df6e676cc7  0000000000000039
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14268:  000000c000b14288  000055df6ca5c8f1 <runtime.sigNotOnStack+129>
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14278:  000055df6e676cc7  0000000000000039
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14288:  000000c000b14338  000055df6ca5be81 <runtime.sigtrampgo+753>
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b14298:  0000000000000011  000000c000b14320
Feb 13 16:29:57 pyrite dockerd[1242]: 000000c000b142a8:  000000c000b14440  0000000000000000
...
Feb 13 16:29:57 pyrite dockerd[1242]: runtime.throw(0x55df6e676cc7, 0x39)
Feb 13 16:29:57 pyrite dockerd[1242]:         /usr/lib64/go/1.13/src/runtime/panic.go:774 +0x74
Feb 13 16:29:57 pyrite dockerd[1242]: runtime.sigNotOnStack(0x11)
Feb 13 16:29:57 pyrite dockerd[1242]:         /usr/lib64/go/1.13/src/runtime/signal_unix.go:578 +0x81

The previous nss_kanidm tokio cargo.toml was:

tokio = { version = "1", features = ["rt", "macros", "sync", "time", "net", "io-util", "signal"] }

nss_kanidm module would make a call to a localhost resolver via a unix domain socket consuming a tokio util codec type. However this required block_on and a tokio run time, so the following code was used in the nss client support module.

pub async fn call_daemon(path: &str, req: ClientRequest) -> Result<ClientResponse, Box<dyn Error>> {
    ...
}

pub fn call_daemon_blocking(
    path: &str,
    req: ClientRequest,
) -> Result<ClientResponse, Box<dyn Error>> {
    let rt = Builder::new_current_thread()
        .enable_all()
        .build()
        .map_err(Box::new)?;
    rt.block_on(call_daemon(path, req))
}

As tokio was built with signal support the call to “enable_all” would cause signal handlers to be added. These signal handlers are managed by signal-hook-registry, and it appears they are not loaded with the sigaction flags for SA_ONSTACK. Go loads it’s signal handlers on an alternate stack, meaning any other registered handlers which do not use this flag cause the Go runtime to panic.

Workarounds

Remove the “signal” flag from tokio in Cargo.toml to prevent these being loaded in the call for enable_all.

Possible Resolutions

I think there are a few ways this could be handled.

One is that signal-hook-registry has the capability to call sigaction with SA_ONSTACK in case that Rust + Go is linked in the same situation, but this may require work with the related projects.

Another possible approach is that Builder is very “coarse” with the enable flags. The options today are:

  • enable_all
  • enable_io
  • enable_time

Today enable_io from docs.rs states “Doing this enables using net, process, signal, and some I/O types on the runtime.” This is rather “coarse”, when the Cargo.toml in tokio shows that signal is not a dependency to net/io (but is for process).

So perhaps tokio could improve the enable_ flags to have more fine grained settings such as enable_net, enable_process (implies signal). This would allow consumers to have finer control to avoid signal handler registration which can also work around the issue.

Thanks,

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 22 (16 by maintainers)

Commits related to this issue

Most upvoted comments

@ipetkov It’s actually worse than that. What it is is that this part of the application sets up one tokio runtime without signal handling in the pam/nsswitch modules. But because we link to another crate which also has tokio, and that does have the signal feature, it taints through and activates it in this part of the application. IE if you have A without signal, and it depends on B that does have signal, A has signal activated.

That’s why I implemented https://github.com/tokio-rs/tokio/pull/3521 because this allows the runtime in A to select what it needs, even though the tokio features have been tainted through the dependency on B.

I’m not really invested enough to try to solve the SA_ONSTACK problem, but certainly it has value for tokio in being able to better cooperate with languages like go that do set SA_ONSTACK.

I wrote a little test program and was able to confirm that the signal handler for SIGCHLD is installed eagerly when the process and signal features are enabled, even when the program does nothing but sleep, and exit.

Ah yes you’re right! I had forgotten that the process driver will eagerly install a SIGCHLD handler which is used for all spawned processes. That’s likely the culprit of the original issue. I agree that a finer grained configuration of the runtime could prevent such eager registrations