go: proposal: a faster C-call mechanism for non-blocking C functions

I’m not sure this is the place to submit this, the golang-dev list might also have been a good candidate.

The cgo FFI mechanism is very general, and makes some pessimistic assumptions about the function being called - specifically it has to handle the case where the called function may block, e.g. in a blocking syscall. Having to handle the worst case adds a lot of overhead to each call into C, including just calling a math function or other small function that does no syscalls or does not block.

The problem is for well-behaved calls that neither block nor call back into Go, if the called function does not do much work then the cgo overhead dominates the runtime and causes poor performance. This is a well known limitation (and often complained about on golang-nuts and elsewhere, google “cgo overhead” returns about 45,000 hits.) Some of this can be mitigated by making the C function do more work, e.g. exporting a chunkier vs chattier API - but that’s not always feasible or desirable.

For some classes of problems this overhead is just not admissible, and people resort to doing bad things like calling C from assembly on a Go stack, which just isn’t built for it. The Go runtime itself finds the C call overhead too much at times, and Dmitry Vyukov had to work around it when integrating the TSAN race checker with Go. A look at https://golang.org/src/runtime/race_amd64.s?m=text shows a custom C call mechanism that just puts the arguments into the right registers, switches stacks from the Go stack to the C stack, and calls the TSAN runtime function.

Since this seems to be a requirement that cannot always be worked around, and it’s a requirement of both Go users and the Go runtime itself, and the workarounds are difficult, fraught, and fragile - I would like to see a solution for this in Go itself. Such a solution should very closely mirror the functionality in race_amd64.s, ideally because that approach is about as low overhead as possible, and because it can be used to replace that code. It could take any form, but since it needs to be backwards compatible and ideally one does not introduce new syntax - something like an attribute on the function declaration, or a cgo directive would be nice. It is up to the programmer to ensure that C functions called in this way do not block, and don’t call back into Go.


e.g. __attribute__ ((nonblocking)) in the declaration,
or // #cgo nonblocking directly preceeding the callsite or the declaration

Is this a reasonable feature for a future version of Go?

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 9
Comments: 38 (21 by maintainers)

Commits related to this issue

runtime: optimize defer code This optimizes deferproc and deferreturn in various ways. The most important optimization is that it more carefully arranges to prevent preemption or stack growth. Curre... — committed to golang/go by aclements 8 years ago
runtime: optimize defer code This optimizes deferproc and deferreturn in various ways. The most important optimization is that it more carefully arranges to prevent preemption or stack growth. Curre... — committed to unclejack/go by aclements 8 years ago

Most upvoted comments

Another gamedev here supporting this proposal. As it stands now, Go would be really a good candidate for a gamedev language, the GC is getting really fast and the language has a lot to offer to the industry. Sadly the cgo performance make it a no starter at the moment. I keep wondering how this cost is also impacting other areas like server applications with heavy UDP traffic. I am using Go for the server in my game and looking at the code it eventually ends up calling various WSA* functions on Windows… so there you go, that price is there even when developing non strictly games software.

Perhaps on Linux there is a difference between a syscall and a cgo call, on Windows it does not seem to be the case tho.

I wonder if, instead of tagging cgo calls we could tag the goroutine to use some kind of “special” stack and get ad-hoc treatment from the scheduler to be able to call into C with as close to 0 cost as possible.

I do think this could improve the performance of the entire tech, I agree with rewriting as much as possible in Go, that’s fine, but it’s still a world of C based operating systems (and it will be for the foreseeable future) and, eventually, you’ll have to call C stuff in order to make things happen… be it a triangle on the screen or a packet over the network.

kunos on Jul 9, 2016

A C function that took 30ns is, optimistically, at most 50 machine instructions, assuming the code cache is hot and the code in question performs only memory accesses that are already in the L1 data cache.

What work can realistically be done in that few number of cycles that cannot be rewritten in Go?

On Fri, 1 Jul 2016, 10:15 Daniel Eloff notifications@github.com wrote:

I’ll grant that, but I’m still not content to leave that much performance on the floor. I’m going to try the race detector C calling approach and see how much of a difference it makes in my use case. I’ll report back in a couple weeks with some numbers hopefully.

A faster CGO mechanism allows calling C functions that do less work. e.g. if the C function takes 200ns and we speed up the CGO mechanism to 30ns (which seems to have been true in past Go versions), then the total calling time for that function goes from 400 to 230ns, almost twice as fast. If the C function itself takes 30ns, then it goes from 230ns to 60ns, which is a four fold improvement. That opens up more options when designing APIs.

I just spent two days implementing a function in assembly in Go because it needed popcnt, prefetch, and bsr. It could have been implemented in two hours in C, but the CGO overhead makes it a non-starter. In other places I duplicate code between C and Go (using very unidiomatic Go with lots of unsafe) to avoid the CGO overhead.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/golang/go/issues/16051#issuecomment-229823348, or mute the thread https://github.com/notifications/unsubscribe/AAAcA864gdwx_X3mtM1OEZFUon2m5hBaks5qRFwogaJpZM4I0hlW .

davecheney on Jul 1, 2016

CL https://golang.org/cl/29656 mentions this issue.

gopherbot on Sep 23, 2016

Basically, the main problem is that calling C functions from Go is slow, and has only become slower. This unfortunately makes Go less suitable for graphical applications and games that have to call into a C OpenGL or Vulkan API many times per second. I don’t have any great ideas on how to solve this problem, but I insist that it is something that needs to be improved.

beoran on Jun 30, 2016

@ianlancetaylor what about seperating C and Go in two processes which connected with pipe or mmap, and make a C-call is translate to send a msg to C-process and waiting for a response. This way can make C-code run faster and keep Go runtime happy.

linkerlin on Sep 11, 2017

We are going to decline this specific proposal. Marking some cgo calls as non-blocking can fail in too many subtle ways that are hard to understand. We are certainly interested in speeding up cgo calls in general, but this approach is not the one we will take.

Note that we have sped up cgo calls in 1.8 in https://golang.org/cl/30080.

ianlancetaylor on Oct 24, 2016