tree-sitter: *** stack smashing detected ***: terminated Fatal error 6: Aborted

Problem

Hi, I updated tree-sitter on ArchLinux from 0.22.2-1 -> 0.22.4-1. After the upgrade, emacs has started to crash whenever I load any tsx file with the following backtrace:

*** stack smashing detected ***: terminated
Fatal error 6: Aborted
Backtrace:
emacs(+0x18a436)[0x5de6a212b436]
emacs(+0x24df3)[0x5de6a1fc5df3]
emacs(+0x25d01)[0x5de6a1fc6d01]
emacs(+0x2cf70d)[0x5de6a227070d]
/usr/lib/libc.so.6(+0x3c770)[0x727ab68fd770]
/usr/lib/libc.so.6(+0x8d32c)[0x727ab694e32c]
/usr/lib/libc.so.6(gsignal+0x18)[0x727ab68fd6c8]
/usr/lib/libc.so.6(abort+0xd7)[0x727ab68e54b8]
/usr/lib/libc.so.6(+0x25395)[0x727ab68e6395]
/usr/lib/libc.so.6(+0x11473b)[0x727ab69d573b]
/usr/lib/libc.so.6(+0x115a56)[0x727ab69d6a56]
emacs(+0x299883)[0x5de6a223a883]
/usr/lib/emacs/29.3/native-lisp/29.3-561b282f/treesit-37439c61-97df641d.eln(F747265657369742d666f6e742d6c6f636b2d666f6e746966792d726567696f6e_treesit_font_lock_fontify_region_0+0x1f2)[0x727a8eb6e832]
emacs(+0x21052e)[0x5de6a21b152e]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/font-lock-895216f6-1f3b244f.eln(F666f6e742d6c6f636b2d666f6e746966792d73796e746163746963616c6c792d726567696f6e_font_lock_fontify_syntactically_region_0+0x62)[0x727ab1e38f42]
emacs(+0x21052e)[0x5de6a21b152e]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/font-lock-895216f6-1f3b244f.eln(F666f6e742d6c6f636b2d64656661756c742d666f6e746966792d726567696f6e_font_lock_default_fontify_region_0+0x4af)[0x727ab1e36bef]
emacs(+0x21052e)[0x5de6a21b152e]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/font-lock-895216f6-1f3b244f.eln(F666f6e742d6c6f636b2d666f6e746966792d726567696f6e_font_lock_fontify_region_0+0x93)[0x727ab1e35873]
emacs(+0x25789e)[0x5de6a21f889e]
emacs(+0x21052e)[0x5de6a21b152e]
emacs(+0x211041)[0x5de6a21b2041]
emacs(+0x20ccac)[0x5de6a21adcac]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/jit-lock-8a988e43-a9956d8b.eln(F6a69742d6c6f636b2d2d72756e2d66756e6374696f6e73_jit_lock__run_functions_0+0xd8)[0x727ab1acac88]
emacs(+0x21052e)[0x5de6a21b152e]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/jit-lock-8a988e43-a9956d8b.eln(F6a69742d6c6f636b2d666f6e746966792d6e6f77_jit_lock_fontify_now_0+0x80a)[0x727ab1acb59a]
emacs(+0x21052e)[0x5de6a21b152e]
/usr/bin/../lib/emacs/29.3/native-lisp/29.3-561b282f/preloaded/jit-lock-8a988e43-a9956d8b.eln(F6a69742d6c6f636b2d66756e6374696f6e_jit_lock_function_0+0x26f)[0x727ab1aca98f]
emacs(+0x21052e)[0x5de6a21b152e]
emacs(+0x2d2b21)[0x5de6a2273b21]
emacs(+0x5706b)[0x5de6a1ff806b]
emacs(+0x5af00)[0x5de6a1ffbf00]
emacs(+0x5c086)[0x5de6a1ffd086]
emacs(+0x5e036)[0x5de6a1fff036]
emacs(+0x5c2b6)[0x5de6a1ffd2b6]
emacs(+0x7ae89)[0x5de6a201be89]
emacs(+0x82d92)[0x5de6a2023d92]
emacs(+0x73863)[0x5de6a2014863]
emacs(+0x20b43c)[0x5de6a21ac43c]
emacs(+0x737b8)[0x5de6a20147b8]
emacs(+0x774b2)[0x5de6a20184b2]
...
zsh: IOT instruction (core dumped)  emacs

I downgraded back to 0.22.2-1 and do not face any issues.

Steps to reproduce

  1. pacman -Syyu <- updates tree-sitter to 0.22.4-1
  2. Start emacs
  3. Open any tsx file.

Expected behavior

Emacs should not crash and I can open tsx files.

Tree-sitter version (tree-sitter --version)

tree-sitter 0.20.8 (d4c1bf7ce78051b7f4a381d1508d68928512ed5f)

Operating system/version

ArchLinux

About this issue

  • Original URL
  • State: closed
  • Created 3 months ago
  • Reactions: 17
  • Comments: 44 (22 by maintainers)

Commits related to this issue

Most upvoted comments

For anyone wanting to downgrade on Arch, I’ve been able to get Emacs non-crashing again with sudo pacman -U file:///var/cache/pacman/pkg/tree-sitter-0.22.2-1-x86_64.pkg.tar.zst [1].

[1] https://wiki.archlinux.org/title/downgrading_packages

If you don’t have the old version of tree-sitter locally, you can download it from the Arch Linux archive at https://archive.archlinux.org/packages/t/tree-sitter/tree-sitter-0.22.2-1-x86_64.pkg.tar.zst

Downgrading to this version fixes it for me, for now.

Breaking ABI compatibility is not an issue at all, especially pre-1.0.

It would make things easier for consumers if the library’s soname was bumped when ABI changed though. Having a different soname allows for the incompatible versions to be installed side by side, and is a signal that consumers need to be rebuilt when it changes.

My understanding is that if the ABI of, say, libexample.so.0.22 changes, it’s not the 22 that’s expected to be bumped to 23, but rather the 0.22 that’s expected to be bumped to 1.0.

That’s correct, I’ve written about this in the past for those who want the gory details. @maxbrunsfeld for us, I think we have to commit to not using the semver version in our library names, and to follow libtool-compatible rules for selecting a SONAME instead. Even though our semver version is pre-1.0, I think we need to be diligent about bumping the SONAME version whenever we introduce a breaking change to the ABI. SONAMEs are different, and don’t have the same “everything goes before 1.0” rule that semver has. And it is often the case that a libary SONAME is not consistent with the package version number, regardless of whether the project uses semver.

@adonig you can downgrade to the previous version of libtree-sitter via sudo dnf install libtree-sitter-0.22.2-1.fc40 and add excludepkgs=libtree-sitter to /etc/dnf/dnf.conf until the issue is resolved. Works for me on Fedora 40 with Emacs 29.3.

Just so that this is clear to me: The solution to the immediate problem would be for the relevant Linux distros to issue a new Emacs package?

I really think that in order to provide a reliable end-user experience, and prevent issue like this, Emacs should just statically link a particular version of Tree-sitter.

If some Linux distros mandates that all libraries, no matter how small, must be distributed as dynamic libraries (which is just… staggeringly impractical IMO), then the Emacs package should specify a particular version of the Tree-sitter package to use - not a version range.

I think we have to commit to not using the semver version in our library names, and to follow libtool-compatible rules for selecting a SONAME instead.

I agree with this. Ideally, we should set up tooling to automatically bump the soname on breaking ABI changes. But if any Emacs package maintainer is seeing this issue - we do not yet have this tooling - this library is maintained by a small team, and may not have perfect ABI stability! Consider taking a more conservative approach to dependency versioning, so that end users don’t have to deal with situations like this!

It sounds like in the future, we should bump the library’s minor version when changing any structs in api.h, because with our current strategy, the soname is based on the minor version.

The Tree-sitter C library’s ABI has indeed changed (due to the TSQueryCursor struct field addition). I think that the Arch Emacs package should probably pin a specific version of the Tree-sitter package. I really don’t want Emacs users to be broken by changes to Tree-sitter. Does anyone know the maintainers of the Arch packages?

In this repo, we try to not make breaking changes unnecessarily, but the library is pre 1.0, in its Semver versioning, and even once it goes 1.0, it’s going to be easier to maintain API stability than full-on ABI stability.

Dynamic libraries are fine, distros just need to ensure that the package that has tree-sitter as a shared lib dep uses the exact same version that the distro provides, but going forward we will try to be more mindful about abi breakage

Go ask the distros.

Rebuilding emacs with the upgraded library may or may not fix it

Arch users may be interested to know that rebuilding emacs 29.3-2 with tree-sitter 0.22.5-1 did not fix the issue for me.

I suspect we’ve hit the same thing in Gentoo as https://bugs.gentoo.org/930039 - it looks like 0.22.2 vs 0.22.4 breaks ABI?

EDIT: quoting some details from the Gentoo bug…

libabigail’s abidiff output is:

$ abidiff /var/tmp/portage/dev-libs/tree-sitter-0.22.{2,4}/image/usr/lib64/libtree-sitter.so.0.22 --fail-no-debug-info --debug-info-dir1 /var/tmp/portage/dev-libs/tree-sitter-0.22.2/image/usr/lib/debug --debug-info-dir2 /var/tmp/portage/dev-libs/tree-sitter-0.22.4/image/usr/lib/debug
Functions changes summary: 0 Removed, 4 Changed (45 filtered out), 0 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

4 functions with some indirect sub-type change:

  [C] 'function const size_t* ts_parser_cancellation_flag(const TSParser*)' at parser.c:1921:1 has some indirect sub-type changes:
    parameter 1 of type 'const TSParser*' has sub-type changes:
      in pointed to type 'const TSParser':
        in unqualified underlying type 'typedef TSParser' at api.h:45:1:
          underlying type 'struct TSParser' at parser.c:90:1 changed:
            type size hasn't changed
            1 data member insertion:
              'bool has_scanner_error', at offset 11616 (in bits) at parser.c:113:1

  [C] 'function void ts_query_cursor_delete(TSQueryCursor*)' at query.c:2986:1 has some indirect sub-type changes:
    parameter 1 of type 'TSQueryCursor*' has sub-type changes:
      in pointed to type 'typedef TSQueryCursor' at api.h:48:1:
        underlying type 'struct TSQueryCursor' at query.c:301:1 changed:
          type size changed from 1152 to 1216 (in bits)
          15 data member changes:
            type of 'TSTreeCursor cursor' changed:
              underlying type 'struct TSTreeCursor' at api.h:105:1 changed:
                type size changed from 192 to 256 (in bits)
                1 data member change:
                  type of 'uint32_t context[2]' changed:
                    type name changed from 'uint32_t[2]' to 'uint32_t[3]'
                    array type size changed from 64 to 96
                    array type subrange 1 changed length from 2 to 3
            'struct {QueryState* contents; uint32_t size; uint32_t capacity;} states' offset changed from 256 to 320 (in bits) (by +64 bits)
            'struct {QueryState* contents; uint32_t size; uint32_t capacity;} finished_states' offset changed from 384 to 448 (in bits) (by +64 bits)
            'CaptureListPool capture_list_pool' offset changed from 512 to 576 (in bits) (by +64 bits)
            'uint32_t depth' offset changed from 832 to 896 (in bits) (by +64 bits)
            'uint32_t max_start_depth' offset changed from 864 to 928 (in bits) (by +64 bits)
            'uint32_t start_byte' offset changed from 896 to 960 (in bits) (by +64 bits)
            'uint32_t end_byte' offset changed from 928 to 992 (in bits) (by +64 bits)
            'TSPoint start_point' offset changed from 960 to 1024 (in bits) (by +64 bits)
            'TSPoint end_point' offset changed from 1024 to 1088 (in bits) (by +64 bits)
            'uint32_t next_state_id' offset changed from 1088 to 1152 (in bits) (by +64 bits)
            'bool on_visible_node' offset changed from 1120 to 1184 (in bits) (by +64 bits)
            'bool ascending' offset changed from 1128 to 1192 (in bits) (by +64 bits)
            'bool halted' offset changed from 1136 to 1200 (in bits) (by +64 bits)
            'bool did_exceed_match_limit' offset changed from 1144 to 1208 (in bits) (by +64 bits)

  [C] 'function TSTreeCursor ts_tree_cursor_copy(const TSTreeCursor*)' at tree_cursor.c:695:1 has some indirect sub-type changes:

  [C] 'function TSTreeCursor ts_tree_cursor_new(TSNode)' at tree_cursor.c:153:1 has some indirect sub-type changes:

I’ve filed a bug upstream with libabigail after discussion on IRC wrt why it didn’t flag it as a breaking change (https://sourceware.org/bugzilla/show_bug.cgi?id=31642).

Truncated backtrace (full at https://bugs.gentoo.org/930039#c2) from our pkgcheck tool which uses tree-sitter-bash:

tests/checks/test_codingstyle.py::TestStaticSrcUri::test_no_report[${P}]
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff49a3726 in ts_tree_cursor_init (self=0x555555650008, node=...) at lib/src/tree_cursor.c:167
167       array_push(&self->stack, ((TreeCursorEntry) {
(gdb)
#0  0x00007ffff49a3726 in ts_tree_cursor_init (self=0x555555650008, node=...) at lib/src/tree_cursor.c:167
#1  ts_tree_cursor_reset (_self=0x555555650008, node=...) at lib/src/tree_cursor.c:160
#2  ts_query_cursor_exec (self=0x555555650000, query=0x555555d8f3f0, node=...) at lib/src/query.c:3045
#3  0x00007ffff529dce0 in query_captures (self=0x7ffff2618210, args=(<tree_sitter.Node at remote 0x7ffff2180b70>,), kwargs=0x0) at tree_sitter/binding.c:2112
#4  0x00007ffff7b0c603 in method_vectorcall_VARARGS_KEYWORDS (func=<method_descriptor at remote 0x7ffff260f740>, args=0x7ffff21f8a98, nargsf=<optimized out>, kwnames=<optimized out>) at Objects/descrobject.c:344
#5  0x00007ffff7af816e in _PyObject_VectorcallTstate (tstate=0x555555576f90, callable=<method_descriptor at remote 0x7ffff260f740>, args=0x7ffff21f8a98, nargsf=<optimized out>, kwnames=0x0) at ./Include/cpython/abstract.h:114
[...]

Valgrind output:

tests/checks/test_codingstyle.py::TestStaticSrcUri::test_no_report[${P}] ==602430== Invalid write of size 8
==602430==    at 0xDC597D4: ts_query_cursor_set_byte_range (query.c:3064)
==602430==    by 0xD609CA2: query_captures (binding.c:2110)
==602430==    by 0x49A2602: method_vectorcall_VARARGS_KEYWORDS (descrobject.c:344)
==602430==    by 0x498E16D: UnknownInlinedFun (abstract.h:114)
==602430==    by 0x498E16D: UnknownInlinedFun (abstract.h:123)
==602430==    by 0x498E16D: UnknownInlinedFun (ceval.c:5893)
==602430==    by 0x498E16D: _PyEval_EvalFrameDefault (ceval.c:4198)
==602430==    by 0x49D4AE9: UnknownInlinedFun (pycore_ceval.h:46)
==602430==    by 0x49D4AE9: UnknownInlinedFun (genobject.c:213)
==602430==    by 0x49D4AE9: gen_iternext (genobject.c:580)
==602430==    by 0x498E39F: _PyEval_EvalFrameDefault (ceval.c:4001)
==602430==    by 0x49D4E02: UnknownInlinedFun (pycore_ceval.h:46)
==602430==    by 0x49D4E02: gen_send_ex2 (genobject.c:213)
==602430==    by 0x4990580: _PyEval_EvalFrameDefault (ceval.c:2586)
==602430==    by 0x49D4AE9: UnknownInlinedFun (pycore_ceval.h:46)
==602430==    by 0x49D4AE9: UnknownInlinedFun (genobject.c:213)
==602430==    by 0x49D4AE9: gen_iternext (genobject.c:580)
==602430==    by 0x49DCFC4: list_extend (listobject.c:960)
==602430==    by 0x49AC14C: method_vectorcall_O (descrobject.c:460)
==602430==    by 0x498E16D: UnknownInlinedFun (abstract.h:114)
==602430==    by 0x498E16D: UnknownInlinedFun (abstract.h:123)
==602430==    by 0x498E16D: UnknownInlinedFun (ceval.c:5893)
==602430==    by 0x498E16D: _PyEval_EvalFrameDefault (ceval.c:4198)
==602430==  Address 0xa430078 is 24 bytes after a block of size 944 in arena "client"
==602430==

valgrind: m_mallocfree.c:304 (get_bszB_as_is): Assertion 'bszB_lo == bszB_hi' failed.
valgrind: Heap block lo/hi size mismatch: lo = 1008, hi = 18446744069414584320.
This is probably caused by your program erroneously writing past the
end of a heap block and corrupting heap metadata.  If you fix any
invalid writes reported by Memcheck, this assertion failure will
probably go away.  Please try that before reporting this as a bug.
```