babel: babel-register cache grows infinitely and breaks v8
Choose one: is this a bug report or feature request? a bug
Expected Behavior
By default, babel-register
creates a cache in the user’s home directory, .babel.json
. This cache appears to be unmanaged based on looking at ./babel-register/lib/cache.js
. The cache should manage itself to avoid growing to an extremely large size.
Current Behavior
I started experiencing v8 crashes when running mocha tests using --compilers js:babel-core/register
as below:
<--- Last few GCs --->
82518 ms: Mark-sweep 807.1 (1039.7) -> 802.3 (1038.7) MB, 149.2 / 0.0 ms [allocation failure] [GC in old space requested].
82668 ms: Mark-sweep 802.3 (1038.7) -> 802.3 (1036.7) MB, 150.6 / 0.0 ms [allocation failure] [GC in old space requested].
82838 ms: Mark-sweep 802.3 (1036.7) -> 802.2 (993.7) MB, 169.7 / 0.0 ms [last resort gc].
82989 ms: Mark-sweep 802.2 (993.7) -> 802.2 (982.7) MB, 150.6 / 0.0 ms [last resort gc].
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0000024EE58CFB61 <JS Object>
1: SparseJoinWithSeparatorJS(aka SparseJoinWithSeparatorJS) [native array.js:~75] [pc=000002B8298FC057] (this=0000024EE5804381 <undefined>,w=0000011715C4D061 <JS Array[7440]>,F=000003681BBC8B19 <JS Array[7440]>,x=7440,I=0000024EE58B46F1 <JS Function ConvertToString (SharedFunctionInfo 0000024EE5852DC9)>,J=000003681BBC8AD9 <String[4]\: ,\n >)
2: DoJoin(aka DoJoin) [native array.js:137...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
I traced these to babel-register/lib/cache.js
code calling JSON.stringify
on the cache object.
try {
serialised = (0, _stringify2.default)(data, null, " ");
} catch (err) {
My .cache.json
was over 200 megabytes. Deleting it immediately resolved the problem.
Possible Solution
- cache should periodically expire old things and have a maximum size
- cache could be implemented using some kind of simple database that’s more efficient than reading the entire cache into memory & rewriting it at the end of a session
Context
Prevents inline transpilation from working properly, and performance suffers significantly as the cache size grows and each operation requires reading/writing a huge file.
Because it’s very difficult to trace the source of v8 crashes, this is a rather insidious bug. There is at least one other bug report in a random package that is almost certainly this issue:
https://github.com/caolan/async/issues/1311
This would primarily become an issue for people running large test suites using babel-register
in a single environment that is never purged (e.g. a dev workstation). I expect even though it may not manifest very often, there are certainly performance and stability implications for a large number of users for never pruning the cache.
Your Environment
Windows 10 Node 6.10.2 Npm 4.2.0 Babel 6.18.2
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 10
- Comments: 31 (14 by maintainers)
@liuxingbaoyu
I tried to run your sample code with the value
a
being a large object instead of a long string. It turns outJSON.strigify
has better performance this time.node --expose_gc main.js
v8.serialize
is implemented usingBuffer
and does not belong to the heap memory of v8.When serializing 255mb of text with the
--max-old-space-size 300
parameter.v8.serialize
works fine, whileJSON.stringify
ooms.255mb cache is big enough, so it’s a good short term solution.
Unless we are going to rewrite a caching system soon.
FYI in babel 7, the cache should go in
node_modules/.cache
now viafindCacheDir
(as mentioned above) so it would be per directory@jamietre There are definitely a number of issues with the caching as it currently exists. I’m not sure if this is why you’re running into this issue, but using babel 6, all projects, run in all environments (NODE_ENV=mocha/development/production) will share a single file. Splitting up the files by environment happened here: https://github.com/babel/babel/pull/5411, and using a location specific to each project was added here: https://github.com/babel/babel/pull/5669. These should “solve” the issue in practice (For example at my job we have a ton of modules, some that are very large, and these fixes solved the immediate issue without having to delete .babel.json every so often). These are of course just stopgaps and won’t actually completely address the issue. You’ll still be able to recreate if you really try.
I get the impression there probably won’t be much work on improving the cache in any major way until a decision about how to unify it with the babel-loader caching, and I think there is some desire to standardize around a caching strategy that can be used by other open source libs like ava. Here’s some background https://github.com/babel/babel/issues/5372.
In the short term, here’s what we’ve done at work to stop the bleeding…turns out deleting your .babel.json file every couple days wasn’t a satisfactory suggestion for most folks 😉
Then just call this file instead of babel-register.
Good luck!
node --expose_gc main.js
It looks like the performance boost is amazing!
😃
Since the state of the
@babel/register
cache is not going to get any better any time soon, and our project is largely depending on@babel/register
performance, I took the time today to overhaul the caching system. It works well even with hundreds of transformed files, and in my first experiments runtime went from several minutes down to several seconds.Code
WARNING: This is not a proper fork. I know. Bad. Please see notes below.
How to test it?
The quick-and-dirty approach is to:
node_modules/@babel/register/lib
foldernode.js
andcache.js
.Of course this is just a hacky, temporary solution. If there is interest, I can put together a patch and even a small patch script.
Some notes
transform
output individually (in a better code-readable format than justjson
).env
(via the undocumentedbabel.getEnv
)cacheKey
for validation.opts
and/orenv
.Big warning
I am in a hurry, so I just copy+pasted a mix of original (
src
) and compiled (lib
) source code files, and went of off that. I did not want to work myself through the whole build process, and it was also important that it is available within our project ASAP, thus the bad copy-and-paste decision. This means:FORK
lib
andsrc
code; will need to touch it up a bit before the PR)However, if someone helps with setting up a FORK and adds tests, and the team agrees with the approach, I am sure we can get the PR out within an hour.
Any feedback is welcome.
I would hope it’s immediately obvious to anyone that one big fat json file is not a viable long-term strategy for fast startup
@hzoo My job has given me the rest of the week to upgrade to babel7 and try to improve the caching for our use cases. Outside of just creating benchmarks and actually implementing a faster cache, are there things that we need to keep in mind? I know there was some talk of using the same caching logic as another project (ava or jest or other?). Is there still a desire to share that logic or is it acceptable to have a babel specific implementation? Currently these are the things I’m going to try:
I’d love some feedback on these ideas. We’re hoping to get something done and submitted for review by the end of the week. Thanks!