lmdb-go: Deadlock under Linux

Hello! I’ve been using my own bindings for a while, but recently switched to your package because of handy API and because Go doesn’t allow to link the same C symbols under different package names.

The problem is, the code you’re using in the package is somehow differs from their tip version on GitHub. I’ve been using the tip 0.9.70 from here: https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/lmdb.h

It has no release date, at first I was confused (see the title change of this issue).

There isn’t much difference in the API, except some documentation improve, but it has one big feature called MDB_USE_SYSV_SEM that allows to use different mutex provider under Linux environment.

Without this flag enabled, txn, err := l.env.BeginTxn(nil, 0) may deadlock, not Go kind of deadlock, but some deeply internal deadlock that can be only killed by kill -9. The holds true for Amazon AWS Linux or desktop Arch linux at least.

My advise is to switch to the tip LMDB version (no degradation AFAIK), and enable MDB_USE_SYSV_SEM by default for Linux systems. See the pull request related as an example of such transition - I just replaced C files and fixed types in two places in Go wrapping.

About this issue

Original URL
State: open
Created 7 years ago
Comments: 23 (10 by maintainers)

Commits related to this issue

Examples demonstrate the difference between Env.Update/Env.UpdateLocked Relates to #94 — committed to bmatsuo/lmdb-go by bmatsuo 7 years ago

Most upvoted comments

I can confirm that the original gist deadlocks on linux, even without the “unsafe” flag lmdb.NoTLS.

lmdb.NoTLS will actually have no effect, in general. It is defined for completeness, but general usage of lmdb-go will ensure that the flag is always provided to the underlying C library to minimize the number of places that LockOSThread must be called (which is still non-zero).

https://godoc.org/github.com/bmatsuo/lmdb-go/lmdb#Env.Open

@xlab Your concerns about LockOSThread are absolutely valid. Calling Env.Update from a goroutine that is already locked to a thread is a serious problem. There are a couple things that you can do with lmdb-go.

I have defined to methods, Env.Update and Env.UpdateLocked. If you know you are already locked to a thread you can call Env.UpdateLocked instead of Env.Update and you will not get premature unlocking of the goroutine – the method’s only real purpose is to provide automatic Commit/Rollback of a lmdb.Txn.

https://godoc.org/github.com/bmatsuo/lmdb-go/lmdb#Env.Update

https://godoc.org/github.com/bmatsuo/lmdb-go/lmdb#Env.UpdateLocked

If you truly cannot determine if the thread will be locked at the callsite of Txn.Update you cannot make an update in that goroutine. You may create a new goroutine to issue the update, because we can guarantee that the new goroutine can be safely locked to a thread. Instead of spinning up new goroutines on the spot you may also have a long-lived worker goroutine responsible for issuing all updates, taking lmdb.TxnOp functions over a channel and responding to the sender with the corresponding error (through a different channel). The design of the solution here is up to the application.
If you are going to mix together a whole lot of CGO packages that all use TLS it might be better to look for safer alternatives to some of those packages. Maybe boltdb instead of lmdb-go would fit the use case fine (to my knowledge it does not require thread locking). Maybe a safer version of your other library is OK because it is not as critical for performance. IDK. It’s up to the application maintainers. But if you need to mix the operations of these C libraries you are probably going to have become quite careful about how you go about doing that (as I imagine you would in a threaded C application using the underlying libraries).

It’s unfortunate that from an API perspective this is all ~the library~ (edit: the lmdb-go package) can provide. Unfortunately, to my knowledge, a goroutine is not able to determine its locked status. It may only call LockOSThread() or UnlockOSThread(), and it is a shortcoming of the API that this can result in premature unlocking (there is no counter on the number of preceding Lock calls, as that also presents problems).

@xlab If sysv semaphores do really allow lmdb-go to eliminate thread locking all together (when combined with the lmdb.NoTLS flag that the package already enforces) that is very interesting, and I would be interested in allowing applications to benefit from that. I definitely think it’s worth following up about this on the openldap-technical mailing list where LMDB development is discussed.

@glycerine I will look into providing more insight into goroutine/thread considerations in the top-level README. I will also probably add some clarifications to the documentation for the methods Env.BeginTxn, and Env.View to discuss thread locking.

Thank you both for you help with this issue. I’m going to leave it open until I’ve merged changes to the docs.

bmatsuo on Jan 30, 2017

write transactions must be confined to a single goroutine which has called LockOSThread(). You can use different goroutines for different write transactions, you just have to ensure that each goroutine that wants to write to the database lock its thread before doing so and never lets the Txn object’s methods be called from another goroutine.

Excellent. Thanks for the clarification.

I agree it would be a kindness to users to mention runtime.LockOSThread in the README and in the intro to the package (that which shows up at the top of the godoc). None of the examples use it either at present.

it’s very error-prone.

@xlab: could you elaborate on the wrong ways to use LockOSThread – what makes it error prone?

glycerine on Jan 29, 2017

Odd though. I don’t see in the repro gist (https://gist.github.com/xlab/f7aee266ab741a0412b44ee145ccbc23) any use of goroutines.

Indeed. I’m sure I would have thought of this more immediately if there were direct use of goroutines.

Are we to conclude that correct use of lmdb-go requires first calling LockOSThread() from a designated go-routine, and then only accessing the api from that goroutine?

I have tried to document this as clearly as possible in various documentation surrounding the BeginTxn, Update, and View methods on Txn. This issue may require me to take another read over those docs.

But yes, write transactions must be confined to a single goroutine which has called LockOSThread(). You can use different goroutines for different write transactions, you just have to ensure that each goroutine that wants to write to the database lock its thread before doing so and never lets the Txn object’s methods be called from another goroutine.

The Txn.Update method does its best to help the programmer by locking and unlocking the calling goroutine’s thread at the appropriate times. But it also cannot stop the programmer from ignoring documentation and passing a write Txn to another goroutine (as an argument or through closure) and using it unsafely.

LMDB requires care to use (regardless of your language or platform). I believe that to be an inescapable truth. I think it is not too bad to use once you know what kind of operations are safe for your application to do. But one must take time to learn their standard gopher intuition can get them into bad territory.

bmatsuo on Jan 29, 2017