oxc: Memory leak in arena with `Atom`s
OXC appears to leak memory when source includes any strings over 24 bytes in length.
A demonstration here: https://github.com/overlookmotel/oxc/tree/leak-test
Checkout that branch, and cargo run -p test_leak.
The problem
I believe the problem is this:
- Strings are stored as
Atoms. AtomwrapsCompactStringfrom compact_str.CompactStringallocates to the heap if string is longer than 24 bytes.Atoms may appear anywhere in the AST, likely within aBox.- Implementation of
Boxused for the arena has noDropimpl. - So when parsed
Programis dropped,Atoms don’t get dropped, and the heap allocationsCompactStringmade are not freed.
In the demo above, I’ve added impl Drop for Atom which logs when an Atom is dropped. It never gets called, demonstrating that the heap memory is leaked.
Possible solutions
1. Add impl Drop to Box
This would be like bumpalo::collections::Box.
Problem is that this would make dropping the AST more expensive, negating a major benefit of using an arena.
2. Track and drop Atoms manually
Store pointers to all non-inline Atoms as they’re created. A Drop impl on the arena could then drop them when the arena is dropped.
3. Replace CompactString
Replace CompactString with a similar “small string” implementation which stores long strings in the bumpalo arena, instead of the general heap.
As well as solving the leak, I believe this would also be more performant:
- Less heap allocations.
- Strings would be stored close to the AST nodes containing them, producing better cache locality.
CompactString is quite complicated, but OXC only uses a small fraction of CompactString’s capabilities, so creating a custom small string implementation in OXC I don’t think would be too tricky.
In addition:
CompactString is built as a replacement for std’s String, and includes a capacity field to allow resizing. But OXC’s Atom is immutable, so the capacity field is not required. All that’s needed is a Box<str> equivalent.
So a new “small string” type could lose the capacity field and be reduced in size to 16 bytes, rather than the current 24. Obviously, only strings up to 16 bytes could then be stored inline, but still most JS identifiers are under 16 characters, and if string content is stored in the arena, storing longer strings “out of line” has less performance penalty.
Further optimizations
For identifers without escapes (pretty much all identifiers) and string literals without escape sequences (most strings), Atom could just be a pointer to the string in source code, rather than copying the string into the arena.
Secondly, when lexing strings and identifiers containing escape sequences, AutoCow creates a string in the arena. That string can be pointed to by Atom rather than copied to another location in the arena.
In fact, escaped strings for some reason currently are stored in the arena three times, though I’m not sure why. Dump of arena contents after parsing x = 'ABCDE\n';:
|0c000000 00000000 486d602e 01000000| ........Hm`..... 00000000
|00000000 0e000000 00000000 00000000| ................ 00000010
|00000000 00000000 b06d602e 01000000| .........m`..... 00000020
|05000000 00000000 806d602e 01000000| .........m`..... 00000030
|00000000 0d000000 0098aa6f 01000000| ...........o.... 00000040
|04000000 0d000000 41424344 450a0000| ........ABCDE... 00000050
|00000000 00000000 00000000 000000c6| ................ 00000060
|f0010000 00004142 4344450a 42434445| ......ABCDE.BCDE 00000070
|00000000 01000000 00000000 01000000| ................ 00000080
|78000000 00000000 00000000 00000000| x............... 00000090
|00000000 000000c1 0090aa6f 01000000| ...........o.... 000000a0
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 39 (18 by maintainers)
Commits related to this issue
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily This PR partially fixes #1803 and is part of #1800. The BigInt value is now boxed, so it can be dropped along with the AST. It is also removed from the `Token`... — committed to oxc-project/oxc by Boshen 6 months ago
- refactor(parser): parse BigInt lazily (#1924) This PR partially fixes #1803 and is part of #1880. BigInt is removed from the `Token` value, so that the token size can be reduced once we removed a... — committed to oxc-project/oxc by Boshen 6 months ago
- feat(allocator,ast): add `BoxWithDrop<BigInt>` relates #1803 — committed to oxc-project/oxc by Boshen 5 months ago
- feat(span): fix memory leak by implementing inlineable string for oxc_allocator closes #1803 This `String` is currently unsafe, but I won't to get miri working before introducing more changes. — committed to oxc-project/oxc by Boshen 5 months ago
- feat(span): fix memory leak by implementing inlineable string for oxc_allocator closes #1803 This `String` is currently unsafe, but I won't to get miri working before introducing more changes. — committed to oxc-project/oxc by Boshen 5 months ago
- feat(span): fix memory leak by implementing inlineable string for oxc_allocator closes #1803 This `String` is currently unsafe, but I won't to get miri working before introducing more changes. — committed to oxc-project/oxc by Boshen 5 months ago
- feat(span): fix memory leak by implementing inlineable string for oxc_allocator (#2294) closes #1803 This string is currently unsafe, but I want to get miri working before introducing more change... — committed to oxc-project/oxc by Boshen 5 months ago
OK great. I’m working on something else at the moment, but will come back to this soon.
OK great. Thanks for the answers.
I have a WIP that ticks most of the boxes, but now that holiday is coming to an end I’ll have much less time to work on it. Please excuse me if I don’t come back with a PR for a while.