hammerspoon: hs.http.asyncPost coroutine crashes Hammerspoon if garbage collected too soon

Crash Info

hs.http.asyncPost works fine with coroutine most of the time, but if the network is slow (the response time is more than a few seconds), it crashes the Hammerspoon.

The crash:

Process:               Hammerspoon [1770]
Path:                  /Applications/Hammerspoon.app/Contents/MacOS/Hammerspoon
Identifier:            org.hammerspoon.Hammerspoon
Version:               0.9.94 (6196)
Code Type:             X86-64 (Native)
Parent Process:        ??? [1]
Responsible:           Hammerspoon [1770]
User ID:               501

Date/Time:             2022-03-17 23:24:09.540 -0700
OS Version:            macOS 11.6.4 (20G417)
Report Version:        12
Bridge OS Version:     6.2 (19P744)
Anonymous UUID:        B53A6AB1-AEB1-4E7D-EA92-AC58D04D06A0

Sleep/Wake UUID:       EEAFD038-5151-4E15-92B3-CAF6F09C45DB

Time Awake Since Boot: 180000 seconds
Time Since Wake:       2500 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY

Application Specific Information:
abort() called

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib            0x00007fff2040b92e __pthread_kill + 10
1   libsystem_pthread.dylib           0x00007fff2043a5bd pthread_kill + 263
2   libsystem_c.dylib                 0x00007fff2038f406 abort + 125
3   org.hammerspoon.Hammerspoon       0x000000010e8a13d0 -[SentryCrashExceptionApplication _crashOnException:] + 58
4   com.apple.AppKit                  0x00007fff230e4c07 -[NSApplication reportException:] + 732
5   org.hammerspoon.Hammerspoon       0x000000010e8a1369 -[SentryCrashExceptionApplication reportException:] + 371
6   com.apple.AppKit                  0x00007fff23198920 uncaughtErrorProc + 145
7   com.apple.CoreFoundation          0x00007fff206289d2 __handleUncaughtException + 716
8   libobjc.A.dylib                   0x00007fff202e8587 _objc_terminate() + 90
9   libc++abi.dylib                   0x00007fff203fd307 std::__terminate(void (*)()) + 8
10  libc++abi.dylib                   0x00007fff203fd2a9 std::terminate() + 41
11  libdispatch.dylib                 0x00007fff2029081a _dispatch_client_callout + 28
12  libdispatch.dylib                 0x00007fff202936fd _dispatch_block_invoke_direct + 241
13  com.apple.CFNetwork               0x00007fff247d5cb6 0x7fff24799000 + 249014
14  com.apple.CoreFoundation          0x00007fff204f687f CFArrayApplyFunction + 62
15  com.apple.CFNetwork               0x00007fff247d5bf1 0x7fff24799000 + 248817
16  com.apple.CFNetwork               0x00007fff247d5a98 0x7fff24799000 + 248472
17  com.apple.CoreFoundation          0x00007fff2053237c __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17

Steps to reproduce

To reproduce the problem, add this code to your config and use the hotkey to crash Hammerspoon:

function test.test(s)
    coroutine.wrap(hs.http.asyncPost)("https://google.com" , nil , nil, function(status,body) print(status,body) end )
    collectgarbage()
    collectgarbage()

end

hs.hotkey.bind(hyper, "P", function()
    test.test("test" .. os.time())
end)

Potential cause

I think the bug is using a stale pointer lua_State * of a garbage-collected coroutine. In the fix for #3011 , we use the coroutine’s lua_State *L but the coroutine itself could be already garbage-collected. When the callback for hs.http.asyncPost is called, it tries to use that lua_State* and crashes.

In the code to reproduce above, I use collectgarbage() to force the garbage collector to collect the coroutine right after it’s done while the HTTP message is still being sent. By the time the response is received and the callback is called, the coroutine is already garbage-collected.

My workaround for now may be to keep the coroutine in a global variable and use the callback to remove it for garbage collection.

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 23 (17 by maintainers)

Commits related to this issue

Remove lua_State property from hs.http connection delegates, it is not needed and in fact makes things unstable when mixed with Lua coroutines. Replaces #3162 and more correctly fixes #3158 — committed to Hammerspoon/hammerspoon by cmsj 2 years ago

Most upvoted comments

@gineer01 FWIW this crash isn’t an exception, it’s a segfault because the lua_State struct has been freed, so trying to read its status member is trying to read memory that Hammerspoon no longer owns.

cmsj on Mar 21, 2022

Ok, your assessment that the current crash by the example given at the top of this specific issue is correct – because the thread has been collected during garbage collection, it no longer exists and the callback can’t run on the specified thread.

Now the issue that prompted #3011 and the subsequent fix that now causes this particular crash… I think dates back to Hydra days when we didn’t have LuaSkin and had to contrive ways to capture the lua_State for callbacks. When we moved to LuaSkin and then further when we updated LuaSkin to work with coroutines, it looks like a few modules may have been incompletely updated to use the new mechanisms available… hs.http being one of them.

The fix I believe requires doing away with all attempts within modules to capture a specific lua_State in the module object or structure, or in the context/void-ptr variable used with some Carbon callbacks.

I’m going to try and get a fix out for http sometime tomorrow and assuming it works, will then look at the other modules and see which others might need the same treatment (finding the Objective-C objects is easy – I just have to look for classes with a lua_State property, which suggests that noises, menubar, and screen.watcher also require a closer look. The handful still using structures and the contexts for Carbon callbacks may take longer as I can’t think of a quick regex to search for them atm… I’ll give it some thought, but it may require eyeballing all of the files 😭)

asmagill on Mar 21, 2022

https://github.com/Hammerspoon/hammerspoon/pull/3165 is an alternative PR which makes hs.http removes the stored lua_State pointer from its connectionDelegate class.

cmsj on Mar 21, 2022

I need to look at it closer today/tonight, but most likely.

asmagill on Mar 19, 2022

@gineer01 can you show me the actual code you’re trying to use that has the issue, not just the contrived example?

There is a reason all callbacks are expected to run on the main L state and not on the coroutine’s – it’s the only state guaranteed to never be collected (at least until you restart/reload) and at the point where a callback is actually acted upon, it’s guaranteed to be re-enterable. You don’t re-enter a coroutine’s state whenever you want – you are only supposed to re-enter it via coroutine.resume, so this attempt to record the coroutine’s state and re-use it will probably create even more instabilities completely aside from the collection issue.

I should be able to take a closer look at this over the weekend.

asmagill on Mar 19, 2022

@latenitefilms the more I thought about returning nil, the less I liked the idea - it has the distinct downside that there are a lot of places that call sharedWithState: and none of them are expecting to get a nil value, so there is a lot of scope for weird bugs there.

Instead, I’ve added a class method on LuaSkin called luaThreadAlive:(lua_State *)L that will safely instantiate LuaSkin with the main Lua thread, check if the given lua_State thread is currently known, and return that, so it can be used as a manual check in the same way as LSGCCanary checks are done, in the callbacks.

One thing I noticed though is that hs.http is the only module that, in its C callbacks, instantiates LuaSkin with a stored thread, which rather suggests to me that every other module with C callbacks is somewhat broken with respect to coroutines - they will all be executed on the main thread? I’m not sure I understand this well enough.

cmsj on Mar 18, 2022

I can see that working, the problem I see with it is that we’d have to implement that in a lot of places, not all of which have convenient Lua wrappers around them, and not all of which have a nice place in ObjC to store the reference.

In #3162 I now have a working (but only lightly tested) Lua thread tracker that works completely in C/ObjC and currently requires no changes to any of the modules. It does have one behaviour that I’m not currently convinced is sensible though, so it may need more work in the various modules (but only in the C/ObjC parts). Specifically, when it discovers that LuaSkin sharedWithState: has been called with a Lua thread that no longer exists, it returns the main Lua thread. This allows the Lua function to complete, but it does mean it’s no longer happening on a coroutine. I’m not sure that this is a good idea - at the very least it should emit a scary warning to the user that they have done something wrong. Probably better would be to just return nil so the LuaSkin object is empty, but to do that we’d need to find all of the places that we call sharedWithState with a non-NULL argument, and check the return value. That would mean we would just not attempt to call the user’s Lua function at all, we would simply bail out of the C/ObjC callback as we do if the GCCanary test fails.

@asmagill as someone who knows a lot more about coroutines than me, do you have some thoughts here?

cmsj on Mar 18, 2022

Thanks very much for the detailed report. You are completely correct that this is a use-after-free because the co-routine has been garbage collected.

The worrying thing here is that I’m not quite sure how we can fix this without patching Lua. The problem is that with the lua_State pointer having been free()'d there’s basically no way for us to touch it without technically committing a use-after-free.

So far the only possible solution I have in mind is that we patch Lua to keep track of lua thread creation/destruction, by keeping an array of pointers to the lua threads, so we have something to search when entering a C callback.

I’d be interested in @asmagill 's thoughts on this too.

Edit: When reproducing this in a debug build, because Clang’s Address Sanitiser is enabled, we get this excellent additional report:

=================================================================
==37908==ERROR: AddressSanitizer: heap-use-after-free on address 0x61100033bad2 at pc 0x000102d63b08 bp 0x7ff7bf3a3280 sp 0x7ff7bf3a3278
READ of size 1 at 0x61100033bad2 thread T0
    #0 0x102d63b07 in lua_status lapi.c:1119
    #1 0x102ad3ef3 in +[LuaSkin sharedWithState:] Skin.m:184
    #2 0x106a489fa in -[connectionDelegate connectionDidFinishLoading:] libhttp.m:70
    #3 0x7ff8144bb033  (CFNetwork:x86_64+0x6c033)
    #4 0x7ff8144baf83  (CFNetwork:x86_64+0x6bf83)
    #5 0x7ff8144baec8  (CFNetwork:x86_64+0x6bec8)
    #6 0x7ff8144ccfc3  (CFNetwork:x86_64+0x7dfc3)
    #7 0x7ff814616fba  (CFNetwork:x86_64+0x1c7fba)
    #8 0x102666f21 in _dispatch_client_callout+0x7 (libdispatch.dylib:x86_64+0x4f21)
    #9 0x10266a7f6 in _dispatch_block_invoke_direct+0x11d (libdispatch.dylib:x86_64+0x87f6)
    #10 0x7ff81448bc43  (CFNetwork:x86_64+0x3cc43)
    #11 0x7ff80fa4e736 in CFArrayApplyFunction+0x42 (CoreFoundation:x86_64h+0x44736)
    #12 0x7ff81448bb74  (CFNetwork:x86_64+0x3cb74)
    #13 0x7ff81448ba28  (CFNetwork:x86_64+0x3ca28)
    #14 0x7ff80fa89aea in __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__+0x10 (CoreFoundation:x86_64h+0x7faea)
    #15 0x7ff80fa89a52 in __CFRunLoopDoSource0+0xb3 (CoreFoundation:x86_64h+0x7fa52)
    #16 0x7ff80fa897cc in __CFRunLoopDoSources0+0xf1 (CoreFoundation:x86_64h+0x7f7cc)
    #17 0x7ff80fa881e7 in __CFRunLoopRun+0x37b (CoreFoundation:x86_64h+0x7e1e7)
    #18 0x7ff80fa877ab in CFRunLoopRunSpecific+0x231 (CoreFoundation:x86_64h+0x7d7ab)
    #19 0x7ff81870ece5 in RunCurrentEventLoopInMode+0x123 (HIToolbox:x86_64+0x2fce5)
    #20 0x7ff81870ea49 in ReceiveNextEventCommon+0x251 (HIToolbox:x86_64+0x2fa49)
    #21 0x7ff81870e7e4 in _BlockUntilNextEventMatchingListInModeWithFilter+0x45 (HIToolbox:x86_64+0x2f7e4)
    #22 0x7ff8124ae5cc in _DPSNextEvent+0x39e (AppKit:x86_64+0x3e5cc)
    #23 0x7ff8124acc89 in -[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:]+0x571 (AppKit:x86_64+0x3cc89)
    #24 0x7ff81249f338 in -[NSApplication run]+0x249 (AppKit:x86_64+0x2f338)
    #25 0x7ff8124732b6 in NSApplicationMain+0x330 (AppKit:x86_64+0x32b6)
    #26 0x100b6bb23 in main MJAppDelegate.m:383
    #27 0x10f13f51d in start+0x1cd (dyld:x86_64+0x551d)

0x61100033bad2 is located 18 bytes inside of 208-byte region [0x61100033bac0,0x61100033bb90)
freed by thread T0 here:
    #0 0x101b536d9 in wrap_free+0xa9 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x486d9)
    #1 0x102b38a9f in l_alloc lauxlib.c:1013
    #2 0x102b28a8f in luaM_free_ lmem.c:135
    #3 0x102d7abcb in luaE_freethread lstate.c:322
    #4 0x102b69bdc in freeobj lgc.c:786
    #5 0x102b6880c in sweeplist lgc.c:827
    #6 0x102b99f69 in sweepstep lgc.c:1564
    #7 0x102b65932 in singlestep lgc.c:1604
    #8 0x102b65029 in luaC_runtilstate lgc.c:1646
    #9 0x102b67db3 in fullinc lgc.c:1702
    #10 0x102b67b43 in luaC_fullgc lgc.c:1720
    #11 0x102d63fbb in lua_gc lapi.c:1144
    #12 0x102d0abbd in luaB_collectgarbage lbaselib.c:236
    #13 0x102ceef3f in luaD_precall ldo.c:548
    #14 0x102c51ff2 in luaV_execute lvm.c:1624
    #15 0x102cf18f0 in ccall ldo.c:593
    #16 0x102cf1a19 in luaD_callnoyield ldo.c:611
    #17 0x102d5f80c in f_call lapi.c:1031
    #18 0x102b29a96 in luai_objcttry lobjectivec_exceptions.m:84
    #19 0x102cdc011 in luaD_rawrunprotected ldo.c:157
    #20 0x102cf7d50 in luaD_pcall ldo.c:908
    #21 0x102d5e1f7 in lua_pcallk lapi.c:1057
    #22 0x102d0c8c7 in luaB_xpcall lbaselib.c:473
    #23 0x102ceef3f in luaD_precall ldo.c:548
    #24 0x102c51ff2 in luaV_execute lvm.c:1624
    #25 0x102cf18f0 in ccall ldo.c:593
    #26 0x102cf1a19 in luaD_callnoyield ldo.c:611
    #27 0x102d5f80c in f_call lapi.c:1031
    #28 0x102b29a96 in luai_objcttry lobjectivec_exceptions.m:84
    #29 0x102cdc011 in luaD_rawrunprotected ldo.c:157

previously allocated by thread T0 here:
    #0 0x101b53825 in wrap_realloc+0xa5 (libclang_rt.asan_osx_dynamic.dylib:x86_64h+0x48825)
    #1 0x102b38ab9 in l_alloc lauxlib.c:1017
    #2 0x102b29775 in luaM_malloc_ lmem.c:192
    #3 0x102d76579 in lua_newthread lstate.c:292
    #4 0x102db4309 in luaB_cocreate lcorolib.c:98
    #5 0x102db46b6 in luaB_cowrap lcorolib.c:106
    #6 0x102ceef3f in luaD_precall ldo.c:548
    #7 0x102c51ff2 in luaV_execute lvm.c:1624
    #8 0x102cf18f0 in ccall ldo.c:593
    #9 0x102cf1a19 in luaD_callnoyield ldo.c:611
    #10 0x102d5f80c in f_call lapi.c:1031
    #11 0x102b29a96 in luai_objcttry lobjectivec_exceptions.m:84
    #12 0x102cdc011 in luaD_rawrunprotected ldo.c:157
    #13 0x102cf7d50 in luaD_pcall ldo.c:908
    #14 0x102d5e1f7 in lua_pcallk lapi.c:1057
    #15 0x102d0c8c7 in luaB_xpcall lbaselib.c:473
    #16 0x102ceef3f in luaD_precall ldo.c:548
    #17 0x102c51ff2 in luaV_execute lvm.c:1624
    #18 0x102cf18f0 in ccall ldo.c:593
    #19 0x102cf1a19 in luaD_callnoyield ldo.c:611
    #20 0x102d5f80c in f_call lapi.c:1031
    #21 0x102b29a96 in luai_objcttry lobjectivec_exceptions.m:84
    #22 0x102cdc011 in luaD_rawrunprotected ldo.c:157
    #23 0x102cf7d50 in luaD_pcall ldo.c:908
    #24 0x102d5e1f7 in lua_pcallk lapi.c:1057
    #25 0x102ad934c in -[LuaSkin protectedCallAndTraceback:nresults:] Skin.m:417
    #26 0x100b9548f in MJLuaRunString MJLua.m:1106
    #27 0x100b71e9e in -[MJConsoleWindowController run:] MJConsoleWindowController.m:148
    #28 0x100b72304 in -[MJConsoleWindowController tryMessage:] MJConsoleWindowController.m:155
    #29 0x7ff8126b26fd in -[NSApplication(NSResponder) sendAction:to:from:]+0x11f (AppKit:x86_64+0x2426fd)

SUMMARY: AddressSanitizer: heap-use-after-free lapi.c:1119 in lua_status
Shadow bytes around the buggy address:
  0x1c2200067700: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x1c2200067710: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c2200067720: fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2200067730: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c2200067740: 00 00 00 00 00 00 00 00 00 00 fa fa fa fa fa fa
=>0x1c2200067750: fa fa fa fa fa fa fa fa fd fd[fd]fd fd fd fd fd
  0x1c2200067760: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c2200067770: fd fd fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2200067780: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c2200067790: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x1c22000677a0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==37908==ABORTING

cmsj on Mar 18, 2022