oxc: Memory leak in arena with `Atom`s

OXC appears to leak memory when source includes any strings over 24 bytes in length.

A demonstration here: https://github.com/overlookmotel/oxc/tree/leak-test

Checkout that branch, and cargo run -p test_leak.

The problem

I believe the problem is this:

  • Strings are stored as Atoms.
  • Atom wraps CompactString from compact_str.
  • CompactString allocates to the heap if string is longer than 24 bytes.
  • Atoms may appear anywhere in the AST, likely within a Box.
  • Implementation of Box used for the arena has no Drop impl.
  • So when parsed Program is dropped, Atoms don’t get dropped, and the heap allocations CompactString made are not freed.

In the demo above, I’ve added impl Drop for Atom which logs when an Atom is dropped. It never gets called, demonstrating that the heap memory is leaked.

Possible solutions

1. Add impl Drop to Box

This would be like bumpalo::collections::Box.

Problem is that this would make dropping the AST more expensive, negating a major benefit of using an arena.

2. Track and drop Atoms manually

Store pointers to all non-inline Atoms as they’re created. A Drop impl on the arena could then drop them when the arena is dropped.

3. Replace CompactString

Replace CompactString with a similar “small string” implementation which stores long strings in the bumpalo arena, instead of the general heap.

As well as solving the leak, I believe this would also be more performant:

  1. Less heap allocations.
  2. Strings would be stored close to the AST nodes containing them, producing better cache locality.

CompactString is quite complicated, but OXC only uses a small fraction of CompactString’s capabilities, so creating a custom small string implementation in OXC I don’t think would be too tricky.

In addition:

CompactString is built as a replacement for std’s String, and includes a capacity field to allow resizing. But OXC’s Atom is immutable, so the capacity field is not required. All that’s needed is a Box<str> equivalent.

So a new “small string” type could lose the capacity field and be reduced in size to 16 bytes, rather than the current 24. Obviously, only strings up to 16 bytes could then be stored inline, but still most JS identifiers are under 16 characters, and if string content is stored in the arena, storing longer strings “out of line” has less performance penalty.

Further optimizations

For identifers without escapes (pretty much all identifiers) and string literals without escape sequences (most strings), Atom could just be a pointer to the string in source code, rather than copying the string into the arena.

Secondly, when lexing strings and identifiers containing escape sequences, AutoCow creates a string in the arena. That string can be pointed to by Atom rather than copied to another location in the arena.

In fact, escaped strings for some reason currently are stored in the arena three times, though I’m not sure why. Dump of arena contents after parsing x = 'ABCDE\n';:

|0c000000 00000000 486d602e 01000000| ........Hm`..... 00000000
|00000000 0e000000 00000000 00000000| ................ 00000010
|00000000 00000000 b06d602e 01000000| .........m`..... 00000020
|05000000 00000000 806d602e 01000000| .........m`..... 00000030
|00000000 0d000000 0098aa6f 01000000| ...........o.... 00000040
|04000000 0d000000 41424344 450a0000| ........ABCDE... 00000050
|00000000 00000000 00000000 000000c6| ................ 00000060
|f0010000 00004142 4344450a 42434445| ......ABCDE.BCDE 00000070
|00000000 01000000 00000000 01000000| ................ 00000080
|78000000 00000000 00000000 00000000| x............... 00000090
|00000000 000000c1 0090aa6f 01000000| ...........o.... 000000a0

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 39 (18 by maintainers)

Commits related to this issue

Most upvoted comments

OK great. I’m working on something else at the moment, but will come back to this soon.

OK great. Thanks for the answers.

I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I’ll have much less time to work on it. Please excuse me if I don’t come back with a PR for a while.