go: x/tools/gopls: do not block on initial `workspace/configuration` request
~/x/tools/gopls$ go version -m $(which gopls)
/usr/local/google/home/bcmills/bin/gopls: devel go1.20-6001c043dc Fri Aug 19 03:32:27 2022 +0000
path golang.org/x/tools/gopls
mod golang.org/x/tools/gopls v0.9.4 h1:YhHOxVi++ILnY+QnH9FGtRKZZrunSaR7OW8/dCp7bBk=
dep github.com/BurntSushi/toml v1.2.0 h1:Rt8g24XnyGTyglgET/PRUNlrUeu9F5L+7FilkXfZgs0=
dep github.com/google/go-cmp v0.5.8 h1:e6P7q2lk1O+qJJb4BtCQXlK8vWEO8V1ZeuEdJNOqZyg=
dep github.com/sergi/go-diff v1.1.0 h1:we8PVUC3FE2uYfodKH/nBHMSetSfHDR6scGdBi+erh0=
dep golang.org/x/exp/typeparams v0.0.0-20220722155223-a9213eeb770e h1:7Xs2YCOpMlNqSQSmrrnhlzBXIE/bpMecZplbLePTJvE=
dep golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4 h1:6zppjxzCulZykYSLyVDYbneBfbaBIQPYMevg0bEwv2s=
dep golang.org/x/sync v0.0.0-20220722155255-886fb9371eb4 h1:uVc8UZUe6tr40fFVnUP5Oj+veunVezqYl9z7DYw9xzw=
dep golang.org/x/sys v0.0.0-20220722155257-8c9f86f7a55f h1:v4INt8xihDGvnrfjMDVXGxw9wrfxYyCjk0KbXjhR55s=
dep golang.org/x/text v0.3.7 h1:olpwvP2KacW1ZWvsR7uQhoyTYvKAupfQrRGBFM352Gk=
dep golang.org/x/tools v0.1.13-0.20220812184215-3f9b119300de h1:b68wxF4nfQjj1XTRHtjVjCximbhAwjztuzDEFGU+n9o=
dep golang.org/x/vuln v0.0.0-20220725105440-4151a5aca1df h1:BkeW9/QJhcigekDUPS9N9bIb0v7gPKKmLYeczVAqr2s=
dep honnef.co/go/tools v0.3.2 h1:ytYb4rOqyp1TSa2EPvNVwtPQJctSELKaMyLfqNP4+34=
dep mvdan.cc/gofumpt v0.3.1 h1:avhhrOmv0IuvQVK7fvwV91oFSGAk5/6Po8GXTzICeu8=
dep mvdan.cc/xurls/v2 v2.4.0 h1:tzxjVAj+wSBmDcF6zBB7/myTy3gX9xvi8Tyr28AuQgc=
build -compiler=gc
build CGO_ENABLED=1
build CGO_CFLAGS=
build CGO_CPPFLAGS=
build CGO_CXXFLAGS=
build CGO_LDFLAGS=
build GOARCH=amd64
build GOOS=linux
build GOAMD64=v2
GNU Emacs 28.1
eglot built from source at joaotavora/eglot commit 000b7fd.
Does this issue reproduce with the latest release?
Yes.
What did you do?
Use gopls with eglot for a large stack of changes in the module at GOROOT/src/cmd.
What did you expect to see?
Responsive gopls actions via eglot.
What did you see instead?
M-x eglot-format-buffer frequently and consistently times out. On further investigation (details in https://github.com/joaotavora/eglot/issues/587#issuecomment-1221072833), I found what appears to be a distributed deadlock.
- gopls is blocking its responses pending a reply to its
workspace/configurationrequest. - It appears that eglot is failing to process the
workspace/configurationrequest because it is still awaiting a response for its owntextDocument/documentSymbolrequest.
The LSP specification recommends that clients and servers be implemented asynchronously, so I believe that both programs are in some sense in the wrong here: eglot should not block on a response to its documentSymbol request, but neither should gopls block on a response to its configuration request.
Instead, gopls should either serve best-effort results for the documentSymbol request without more detailed configuration, or it should serve an explicit error in order to resolve any intervening requests from the client.
(I believe that https://github.com/joaotavora/eglot/issues/587 tracks the corresponding bug on the eglot side.)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (12 by maintainers)
I’ve detected the culprit which is deep in an edge case in Emacs’s
jsonrpc.el.It’s rather surprising that this bug took so long to manifest itself after that code has been in use for more than 4 years. I’ve come up with a provisional fix for the problem reported here and at least one of the problems being reported in the big-bag-o-gopls-problems of https://github.com/joaotavora/eglot/issues/587. I will file an Emacs bug report.I could detect no fault on gopls side. It’s possible that a recent optimization to it made this bug on the Eglot client’s side more probable. @findleyr you may probably close this on this side.
And here is the patch for Emacs’s
jsonrpc.elif anyone wants to test it. It contains a bit of a hint of what caused the problem.I am sorry that this has slipped so much. I’ll try to fix this next week.
No it doesn’t work like that
No this “blocking command” theory is a red herring, I’m 99,9% sure of that.
But your “works with fundamental-mode” observation is very curious and may hold a clue.
Anyway I’ve now reproduced the problem in my machine, which is very important. The gopls messages are indeed reaching Emacs, but not Eglot for some reason. Will come back to this soon.
Filed Emacs bug: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60088
It might be good to run emacs under
straceto see which program sends what, and which program reads what, independent of how the two programs log requests.I typically use
strace -fvy -o /tmp/st.working -r -s2048 emacs […]for that.Here is the output for a broken (hanging) run:
Here is the output for a working run:
Let me know if providing the full strace files would be helpful (I’ll need to scrub them of any sensitive information first, so it might be better to produce your own strace files).
Thanks for giving it a shot. I was about to report that the recipe you linked to (https://gist.github.com/sirn/510fbd9c15c0f85533fdbb62569200c8) does reproduce the issue for me. I have modified the
init-eglot-test.elfrom the recipe slightly (user-emacs-directoryin/tmp, openingnftables_test.goinstead ofmain.go): https://gist.github.com/stapelberg/d802e84321d94f946171862e0e9f2c94When I run
emacs -Q -l init-eglot-test.elin the nftables directory, I run into the issue.Given that you can reproduce the issue on your machine, too, I assume it’s quicker for you to gather your own logs, but do let me know if it would still be helpful to gather some logs (and which ones in particular).
I’ve downloaded the nftables repository and have been able to reproduce the problem. But I can’t buy the deadlock theory because I was careful to ensure that no outgoing request by Eglot to the LSP server Gopls is made. So if the server wanted to send a
workspace/configurationrequest, I don’t see why it wouldn’t be able to. There’s nothing pending from the Eglot side. It’s just sitting there waiting. If I operate Eglot to actually issue a request, then the request will time out.Here’s an event log, I hope it’s not too long:
For me, the issue is not reproducible with small .go files. It only triggers with large .go files (I speculate because that changes the timing of things). My example is nftables_test.go in the https://github.com/google/nftables repository.
I attempted to fix this naively… but unfortunately I don’t think we can fix this in gopls without either degrading functionality for all other clients or introducing arbitrary timeouts. @bcmills alluded to this in the issue description, but I think it’s worth discussing the consequences of fixing this in gopls.
For context, here is the standard LSP initialization sequence:
initializerequest, containing a list of workspace folders. However, gopls cannot request configuration for these folders inside theinitializehandler, as per the LSP spec “the server is not allowed to send any requests or notifications to the client until it has responded with an InitializeResult”.initializeresult, the LSP client sends aninitializednotification to gopls when it is ready. Gopls uses this as a cue to ask for configuration and sends aworkspace/configurationrequest to retrieve folder-specific configuration.initializednotification, the LSP client sends atextDocument/didOpennotification to gopls followed by e.g. a request fortextDocument/documentSymbol(ortextDocument/semanticTokens, etc.).The problem is that the
documentSymbolresult in (3) depends on the configuration returned in (2); we don’t produce semantic tokens without packages, and we don’t load packages without a Go environment, and we don’t have a Go environment without configuration. Therefore, if the configuration request in (2) is blocked, gopls has the choice of either serving a degraded result in (3), or timing out.Note:
For these reasons, serving degraded results is not an acceptable solution. For similar reasons, it is also unacceptable to return an error for these requests (most clients will not retry errors). I think the best we could do would be to timeout the blocked
documentSymbolrequest server side, but of course implementing arbitrary server-side timeouts should be avoided.To be clear, I’m not saying that gopls is “right” in this instance. AFAIK the LSP spec does not prescribe which server->client requests can be made in the context of handling a client->server request. However, I’m not sure how the
workspace/configurationrequest could be useful without blocking: configuration exists to affect server behavior.It does seem odd/asymmetric that the lifecycle of configuration state is managed server-side, when the protocol is very careful to delegate all other state to the client. For example, servers are not supposed to implement their own file watching, instead registering watch patterns with the client. By analogy, it would make sense for the server to register configuration scopes that it cares about, in order to receive configuration change notifications from the cleint.
Therefore, I think we have the following options for fixing this bug:
workspace/configurationrequests altogether. Frankly, it makes sense to me that we get out of the business of managing configuration server-side (see also https://github.com/microsoft/language-server-protocol/issues/972), but this is a major API change for gopls that isn’t happening anytime soon, unfortunately.workspace/configurationrequest asynchronously.workspace.configurationclient capability.Of these, I think (3) or (4) – fixes in eglot – are the most realistic short-term solutions. Longer term we should consider (2) for gopls: change the way gopls manages configuration.
CC @hyangah @adonovan