hammerspoon: hs.http.asyncPost coroutine crashes Hammerspoon if garbage collected too soon
Crash Info
hs.http.asyncPost works fine with coroutine most of the time, but if the network is slow (the response time is more than a few seconds), it crashes the Hammerspoon.
The crash:
Process: Hammerspoon [1770]
Path: /Applications/Hammerspoon.app/Contents/MacOS/Hammerspoon
Identifier: org.hammerspoon.Hammerspoon
Version: 0.9.94 (6196)
Code Type: X86-64 (Native)
Parent Process: ??? [1]
Responsible: Hammerspoon [1770]
User ID: 501
Date/Time: 2022-03-17 23:24:09.540 -0700
OS Version: macOS 11.6.4 (20G417)
Report Version: 12
Bridge OS Version: 6.2 (19P744)
Anonymous UUID: B53A6AB1-AEB1-4E7D-EA92-AC58D04D06A0
Sleep/Wake UUID: EEAFD038-5151-4E15-92B3-CAF6F09C45DB
Time Awake Since Boot: 180000 seconds
Time Since Wake: 2500 seconds
System Integrity Protection: enabled
Crashed Thread: 0 Dispatch queue: com.apple.main-thread
Exception Type: EXC_CRASH (SIGABRT)
Exception Codes: 0x0000000000000000, 0x0000000000000000
Exception Note: EXC_CORPSE_NOTIFY
Application Specific Information:
abort() called
Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0 libsystem_kernel.dylib 0x00007fff2040b92e __pthread_kill + 10
1 libsystem_pthread.dylib 0x00007fff2043a5bd pthread_kill + 263
2 libsystem_c.dylib 0x00007fff2038f406 abort + 125
3 org.hammerspoon.Hammerspoon 0x000000010e8a13d0 -[SentryCrashExceptionApplication _crashOnException:] + 58
4 com.apple.AppKit 0x00007fff230e4c07 -[NSApplication reportException:] + 732
5 org.hammerspoon.Hammerspoon 0x000000010e8a1369 -[SentryCrashExceptionApplication reportException:] + 371
6 com.apple.AppKit 0x00007fff23198920 uncaughtErrorProc + 145
7 com.apple.CoreFoundation 0x00007fff206289d2 __handleUncaughtException + 716
8 libobjc.A.dylib 0x00007fff202e8587 _objc_terminate() + 90
9 libc++abi.dylib 0x00007fff203fd307 std::__terminate(void (*)()) + 8
10 libc++abi.dylib 0x00007fff203fd2a9 std::terminate() + 41
11 libdispatch.dylib 0x00007fff2029081a _dispatch_client_callout + 28
12 libdispatch.dylib 0x00007fff202936fd _dispatch_block_invoke_direct + 241
13 com.apple.CFNetwork 0x00007fff247d5cb6 0x7fff24799000 + 249014
14 com.apple.CoreFoundation 0x00007fff204f687f CFArrayApplyFunction + 62
15 com.apple.CFNetwork 0x00007fff247d5bf1 0x7fff24799000 + 248817
16 com.apple.CFNetwork 0x00007fff247d5a98 0x7fff24799000 + 248472
17 com.apple.CoreFoundation 0x00007fff2053237c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
Steps to reproduce
To reproduce the problem, add this code to your config and use the hotkey to crash Hammerspoon:
function test.test(s)
coroutine.wrap(hs.http.asyncPost)("https://google.com" , nil , nil, function(status,body) print(status,body) end )
collectgarbage()
collectgarbage()
end
hs.hotkey.bind(hyper, "P", function()
test.test("test" .. os.time())
end)
Potential cause
I think the bug is using a stale pointer lua_State * of a garbage-collected coroutine. In the fix for #3011 , we use the coroutine’s lua_State *L but the coroutine itself could be already garbage-collected. When the callback for hs.http.asyncPost is called, it tries to use that lua_State* and crashes.
In the code to reproduce above, I use collectgarbage() to force the garbage collector to collect the coroutine right after it’s done while the HTTP message is still being sent. By the time the response is received and the callback is called, the coroutine is already garbage-collected.
My workaround for now may be to keep the coroutine in a global variable and use the callback to remove it for garbage collection.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 23 (17 by maintainers)
@gineer01 FWIW this crash isn’t an exception, it’s a segfault because the lua_State struct has been freed, so trying to read its
statusmember is trying to read memory that Hammerspoon no longer owns.Ok, your assessment that the current crash by the example given at the top of this specific issue is correct – because the thread has been collected during garbage collection, it no longer exists and the callback can’t run on the specified thread.
Now the issue that prompted #3011 and the subsequent fix that now causes this particular crash… I think dates back to Hydra days when we didn’t have LuaSkin and had to contrive ways to capture the lua_State for callbacks. When we moved to LuaSkin and then further when we updated LuaSkin to work with coroutines, it looks like a few modules may have been incompletely updated to use the new mechanisms available…
hs.httpbeing one of them.The fix I believe requires doing away with all attempts within modules to capture a specific lua_State in the module object or structure, or in the context/void-ptr variable used with some Carbon callbacks.
I’m going to try and get a fix out for
httpsometime tomorrow and assuming it works, will then look at the other modules and see which others might need the same treatment (finding the Objective-C objects is easy – I just have to look for classes with a lua_State property, which suggests thatnoises,menubar, andscreen.watcheralso require a closer look. The handful still using structures and the contexts for Carbon callbacks may take longer as I can’t think of a quick regex to search for them atm… I’ll give it some thought, but it may require eyeballing all of the files 😭)https://github.com/Hammerspoon/hammerspoon/pull/3165 is an alternative PR which makes hs.http removes the stored
lua_Statepointer from itsconnectionDelegateclass.I need to look at it closer today/tonight, but most likely.
@gineer01 can you show me the actual code you’re trying to use that has the issue, not just the contrived example?
There is a reason all callbacks are expected to run on the main
Lstate and not on the coroutine’s – it’s the only state guaranteed to never be collected (at least until you restart/reload) and at the point where a callback is actually acted upon, it’s guaranteed to be re-enterable. You don’t re-enter a coroutine’s state whenever you want – you are only supposed to re-enter it viacoroutine.resume, so this attempt to record the coroutine’s state and re-use it will probably create even more instabilities completely aside from the collection issue.I should be able to take a closer look at this over the weekend.
@latenitefilms the more I thought about returning
nil, the less I liked the idea - it has the distinct downside that there are a lot of places that callsharedWithState:and none of them are expecting to get anilvalue, so there is a lot of scope for weird bugs there.Instead, I’ve added a class method on
LuaSkincalledluaThreadAlive:(lua_State *)Lthat will safely instantiate LuaSkin with the main Lua thread, check if the givenlua_Statethread is currently known, and return that, so it can be used as a manual check in the same way asLSGCCanarychecks are done, in the callbacks.One thing I noticed though is that
hs.httpis the only module that, in its C callbacks, instantiates LuaSkin with a stored thread, which rather suggests to me that every other module with C callbacks is somewhat broken with respect to coroutines - they will all be executed on the main thread? I’m not sure I understand this well enough.I can see that working, the problem I see with it is that we’d have to implement that in a lot of places, not all of which have convenient Lua wrappers around them, and not all of which have a nice place in ObjC to store the reference.
In #3162 I now have a working (but only lightly tested) Lua thread tracker that works completely in C/ObjC and currently requires no changes to any of the modules. It does have one behaviour that I’m not currently convinced is sensible though, so it may need more work in the various modules (but only in the C/ObjC parts). Specifically, when it discovers that
LuaSkin sharedWithState:has been called with a Lua thread that no longer exists, it returns the main Lua thread. This allows the Lua function to complete, but it does mean it’s no longer happening on a coroutine. I’m not sure that this is a good idea - at the very least it should emit a scary warning to the user that they have done something wrong. Probably better would be to just return nil so the LuaSkin object is empty, but to do that we’d need to find all of the places that we callsharedWithStatewith a non-NULL argument, and check the return value. That would mean we would just not attempt to call the user’s Lua function at all, we would simply bail out of the C/ObjC callback as we do if the GCCanary test fails.@asmagill as someone who knows a lot more about coroutines than me, do you have some thoughts here?
Thanks very much for the detailed report. You are completely correct that this is a use-after-free because the co-routine has been garbage collected.
The worrying thing here is that I’m not quite sure how we can fix this without patching Lua. The problem is that with the lua_State pointer having been free()'d there’s basically no way for us to touch it without technically committing a use-after-free.
So far the only possible solution I have in mind is that we patch Lua to keep track of lua thread creation/destruction, by keeping an array of pointers to the lua threads, so we have something to search when entering a C callback.
I’d be interested in @asmagill 's thoughts on this too.
Edit: When reproducing this in a debug build, because Clang’s Address Sanitiser is enabled, we get this excellent additional report: