go: runtime: C.setgid hangs on Linux

What version of Go are you using (go version)?

go version devel go1.18-a8e6556445 Fri Apr 1 09:06:13 2022 +0000 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/codespace/.cache/go-build"
GOENV="/home/codespace/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="devel go1.18-a8e6556445 Fri Apr 1 09:06:13 2022 +0000"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/workspaces/go/gotest/go.mod"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3393830914=/tmp/go-build -gno-record-gcc-switches"

What did you do?

  1. Compile this C snippet into a shared library:
#include <pthread.h>
#include <stdlib.h>

pthread_mutex_t *lock_new(void)
{
    pthread_mutex_t *lock = malloc(sizeof(pthread_mutex_t));
    pthread_mutex_lock(lock);
    return lock;
}
$ gcc -fPIC -c foo.c
$ gcc -shared -o foo.so -pthread foo.o
  1. From Go, load the shared library at runtime and then call setgid:
package main

/*
#include <dlfcn.h>
#include <unistd.h>
#cgo LDFLAGS: -ldl -static
*/
import "C"
import "fmt"

func main() {
	fmt.Println("Step 1")
	if C.dlopen(C.CString("./foo.so"), C.RTLD_NOW) == nil {
		panic("library not found")
	}
	fmt.Println("Step 2")
	C.setgid(0)
	fmt.Println("Step 3")
}
$ go run .

What did you expect to see?

# main
/usr/bin/ld: /tmp/go-link-95168809/000001.o: in function `_cgo_1403fb244f50_Cfunc_dlopen':
/tmp/go-build/cgo-gcc-prolog:54: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
Step 1
Step 2
Step 3

What did you see instead?

# main
/usr/bin/ld: /tmp/go-link-95168809/000001.o: in function `_cgo_1403fb244f50_Cfunc_dlopen':
/tmp/go-build/cgo-gcc-prolog:54: warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
Step 1
Step 2

The C.setgid call hangs indefinitely.

Observations

  • A pure C program doing the same calls does not hang
  • It also hangs if the dlopen and setgid calls are done from the C preamble
  • It does not hang if pthreads are not used
  • It does not hang if -static is not set in the LDFLAGS directive in the C preamble

This might be related to #3871 and #9400

@rsc @ianlancetaylor

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 16 (16 by maintainers)

Most upvoted comments

This C program seg faults on my machine if linked with -static:

#include <dlfcn.h>
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>

void* f(void*) { 
	void *p;

	printf("Step 1\n");
	p = dlopen("./foo.so", RTLD_NOW);
	printf("%p\n", p);
	printf("Step 2\n");
	setgid(0);
	printf("Step 3\n");
}

int main() {
	pthread_t t;

	pthread_create(&t, 0, f, 0);
	pthread_join(t, 0);
	return 0;
}

(foo.so is the same as original.)

$ cc x.c -ldl -pthread -static
/usr/bin/ld: /tmp/ccI3WzJ1.o: in function `f':
x.c:(.text+0x2b): warning: Using 'dlopen' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking
$ ./a.out 
Step 1
0x7f02e8000fc0
Step 2
Segmentation fault

It seems to die in the SIGSETXID handler

(gdb) bt
#0  sighandler_setxid (sig=33, si=0x7fffffffd4b0, ctx=<optimized out>) at nptl-init.c:190
#1  sighandler_setxid (sig=<optimized out>, si=0x7fffffffd4b0, ctx=<optimized out>) at nptl-init.c:177
#2  <signal handler called>
#3  0x00000000004063e8 in __futex_abstimed_wait_common64 (futex_word=futex_word@entry=0x7ffff7ff8910, expected=2128007, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=128, 
    cancel=cancel@entry=true) at ../sysdeps/nptl/futex-internal.c:74
#4  0x000000000040644b in __futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7ffff7ff8910, expected=<optimized out>, clockid=clockid@entry=0, abstime=abstime@entry=0x0, 
    private=private@entry=128) at ../sysdeps/nptl/futex-internal.c:123
#5  0x0000000000403d74 in __pthread_clockjoin_ex (threadid=140737354106432, thread_return=0x0, clockid=0, abstime=0x0, block=<optimized out>) at pthread_join_common.c:102
#6  0x0000000000401904 in main ()

It doesn’t crash if it is not linked statically, nor if the setgid call is not made on a non-main thread (it doesn’t seem to matter where dlopen call is made).

This is a linux/amd64 machine with glibc 2.33.

I’m leaning to think that this is probably a bug in the C library.

That makes sense since the CGo mechanism for handling syscall.Setgid() should be in effect and nothing about the syscall.AllThreadsSyscall() mechanism (substantially changed in go1.18 vs go1.17) should be in play.

The observation that compilation with -static, in the C preamble of the .go file, might be important. I found this bug that seems to be saying that something about the dlopen mechanism is sensitive to the right dynamic symbols being available: https://sourceware.org/bugzilla/show_bug.cgi?id=16628 . In general, this kind of linkage and runtime behavior looks pretty subtle.