gatsby: V8 serialize error on build with huge number of pages(100k+)
Description
Getting various issues(related to V8 serialize etc) when trying to build large number of pages(80k+ docs of 10kb each) with latest gatsby+remark resulting in the build failure.
Basically build crashes with below errors.
Without loki
success run page queries - 2027.713 s — 3335/3335 1.64 queries/second
node[11428]: ../src/node_buffer.cc:412:MaybeLocal<v8::Object> node::Buffer::New(node::Environment *, char *, size_t): Assertion `length <= kMaxLength' failed.
1: 0x100033d65 node::Abort() [/usr/local/bin/node]
2: 0x100032dab node::MakeCallback(v8::Isolate*, v8::Local<v8::Object>, char const*, int, v8::Local<v8::Value>*, node::async_context) [/usr/local/bin/node]
3: 0x100046ff5 _register_buffer() [/usr/local/bin/node]
4: 0x100098391 node::(anonymous namespace)::SerializerContext::ReleaseBuffer(v8::FunctionCallbackInfo<v8::Value> const&) [/usr/local/bin/node]
5: 0x10022b83f v8::internal::FunctionCallbackArguments::Call(v8::internal::CallHandlerInfo*) [/usr/local/bin/node]
6: 0x10022ad81 v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) [/usr/local/bin/node]
7: 0x10022a3d0 v8::internal::Builtin_Impl_HandleApiCall(v8::internal::BuiltinArguments, v8::internal::Isolate*) [/usr/local/bin/node]
8: 0x23830e0841bd
9: 0x23830e093a09
error Command failed with signal "SIGABRT".
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
With loki(GATSBY_DB_NODES=loki)
success run page queries - 1976.632 s — 3335/3335 1.69 queries/second
Stacktrace:
ptr1=0x25b9d2202321
ptr2=0x0
ptr3=0x0
ptr4=0x0
failure_message_object=0x7ffeefbed370
==== JS stack trace =========================================
0: ExitFrame [pc: 0x355f68c041bd]
1: StubFrame [pc: 0x355f68c85ad7]
Security context: 0x25b9bbd9e6c9 <JSObject>
2: saveState [0x25b910175701] [/Users/guns/public/node_modules/gatsby/dist/db/index.js:30] [bytecode=0x25b9097edb61 offset=181](this=0x25b90cd96279 <Object map = 0x25b90f1ad969>)
3: /* anonymous */ [0x25b9850fee71](this=0x25b93d108c59 <JSGlobal Object>,0x25b9d2202321 <the_hole>)
4: StubFrame [pc: 0x355f68c42871]
5: StubFrame [pc: 0x355f68c21b9a]
6: EntryFrame [pc: 0x355f68c0ba01]
==== Details ================================================
[0]: ExitFrame [pc: 0x355f68c041bd]
[1]: StubFrame [pc: 0x355f68c85ad7]
[2]: saveState [0x25b910175701] [/Users/guns/public/node_modules/gatsby/dist/db/index.js:30] [bytecode=0x25b9097edb61 offset=181](this=0x25b90cd96279 <Object map = 0x25b90f1ad969>) {
// stack-allocated locals
var .generator_object = 0x25baaacb2ee9 <JSGenerator>
var /* anonymous */ = 0x25baaacb2eb9 <Promise map = 0x25b9b4783e89>
// expression stack (top to bottom)
[11] : 0x25b9d2202321 <the_hole>
[10] : 0x25b9097ed889 <String[24]: Error persisting state: >
[09] : 0x25b921b04c89 <Object map = 0x25b983544361>
[08] : 0x25b9794ede29 <JSBoundFunction (BoundTargetFunction 0x25b9794ecbf1)>
[07] : 0x25b9101767b9 <FunctionContext[9]>
[06] : 0x25b9850ff331 <CatchContext[5]>
[05] : 0x25b9101767b9 <FunctionContext[9]>
[04] : 0x25b9101767b9 <FunctionContext[9]>
[03] : 0x25b90cd96279 <Object map = 0x25b90f1ad969>
[02] : 0x25b910175701 <JSFunction saveState (sfi = 0x25b9996e74f9)>
--------- s o u r c e c o d e ---------
function saveState() {\x0a if (saveInProgress) return;\x0a saveInProgress = true;\x0a\x0a try {\x0a await Promise.all(dbs.map(db => db.saveState()));\x0a } catch (err) {\x0a report.warn(`Error persisting state: ${err && err.message || err}`);\x0a }\x0a\x0a saveInProgress = false;\x0a}
-----------------------------------------
}
[3]: /* anonymous */ [0x25b9850fee71](this=0x25b93d108c59 <JSGlobal Object>,0x25b9d2202321 <the_hole>) {
// optimized frame
--------- s o u r c e c o d e ---------
<No Source>
-----------------------------------------
}
[4]: StubFrame [pc: 0x355f68c42871]
[5]: StubFrame [pc: 0x355f68c21b9a]
[6]: EntryFrame [pc: 0x355f68c0ba01]
=====================
error Command failed with signal "SIGILL".
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Interestingly the build for 200k(4.5kb each post) runs successfully on gatsby@2.3.19 which uses JSON.stringify to persist state(shows a redux persisting state warning, but everything works).
Steps to reproduce
Repro repo: https://github.com/ganapativs/gatsby-v8-issue-repro (README has everything related to the issue and other observations).
Expected result
Build should be successful without V8 serialize error.
Actual result
Build crashed with V8 serialize error. DANGEROUSLY_DISABLE_OOM would have helped temporarily, but, it was removed recently 😅
Environment
System: OS: macOS 10.15 CPU: (8) x64 Intel® Core™ i7-4870HQ CPU @ 2.50GHz Shell: 5.7.1 - /bin/zsh
Binaries: Node: 10.6.0 - /usr/local/bin/node Yarn: 1.7.0 - ~/.yarn/bin/yarn npm: 6.1.0 - /usr/local/bin/npm
Languages: Python: 2.7.16 - /usr/bin/python
Browsers: Chrome: 76.0.3809.132 Safari: 13.0
npmPackages: gatsby: 2.14.0 => 2.14.0 gatsby-plugin-styled-components: 3.1.3 => 3.1.3 gatsby-remark-autolink-headers: 2.1.8 => 2.1.8 gatsby-remark-prismjs: 3.3.9 => 3.3.9 gatsby-remark-sub-sup: 1.0.0 => 1.0.0 gatsby-source-mongodb: 2.1.9 => 2.1.9 gatsby-transformer-remark: 2.6.19 => 2.6.19
npmGlobalPackages: gatsby-cli: 2.7.40
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 3
- Comments: 19 (12 by maintainers)
Commits related to this issue
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux state. This is faster than `JSON.parse`. Unfortunately, as reported in #17233, t... — committed to gatsbyjs/gatsby by pvdz 4 years ago
- fix(gatsby): Chunk nodes when serializing redux to prevent OOM (#21555) * fix(gatsby): Chunk nodes when serializing redux to prevent OOM We are using `v8.serialize` to write and read the redux sta... — committed to gatsbyjs/gatsby by pvdz 4 years ago
Thank you very much 👍
The fix was published in gatsby@2.19.22
Please report if there are still problems and provide a repro if that’s the case.
(And hey, if your issue is now fixed, let me know too 😃 )
Ok, I am looking into it now. Consider this a research post while I’m trying to dig in.
I think https://github.com/nodejs/help/issues/1059 is interesting because that implies that there shouldn’t be a concrete difference between the v8.serialize and json.stringify, apart from a more aggressive gc schedule. This could very well be the reason. Additionally we might consider that the performance improvement of using v8.serialize over json.stringify is only perceived and the cost is ultimately still paid before the process exits. That’s an interesting fact.
Keep in mind, async operations may change the impact, as an async operation might give nodejs more idle time to run GC. Of course, if postponing GC leads to OOMs we need to re-evaluate that.
The repro.
I had to install mongo (
sudo apt-get install mongodb mongo-clients) because that wasn’t available on my xfce/ubuntu system.I had to update the script a tiny bit to get the repo working;
because I was getting “invalid id” errors by mongo (added the suffix because why not, I realize the repo doesn’t have that).
Running it on 10k pages without bumping the memory quickly OOMs during sourcing.
Running it with 12GB;
Memory consumption remains fairly stable during the sourcing step (~4gb?). Sourcing takes about 730s (yikes). I’ll have to look into that on a separate note. On the next run this took just 70 seconds, maybe mongo is caching it? The
run queriesstep runs at literally 1 query per second. Will be testing a fix that was merged yesterday to improve this ~1000x. This patch will not apply to loki btw. Memory consumption during the 1q/s “run queries” slowly increased to 7gb. Immediatey after the run queries step it crashed out, well under the available memory.Here’s my runtime output:
The crash is not a regular OOM but a string length assertion error. Perhaps the serialized content is too much to bear. (After all, there are inherent limits to nodejs, a maximum string length is one of them.)
The above ran on gatsby@2.17.7 in node v10.17.
Bumping it to gatsby@2.19.8 (
yarn add gatsby@2.19.8).This time the sourcing step improved, but there was no change to the run queries time. Debugging that it seems it doesn’t use a filter at all. It seems the slowness is coming from the html handler. Will have to look into that later.
After about three or four restarts (while debugging) the build now OOMs during the createPages step, which took 2s before. And I cannot get it to move forward. In this case I can see the memory grow (relatively) rapidly and after ~2 minutes the 12gb are up and it OOMs. I took a break and picked it up the next day. When I got back to it this step was fine again, not sure what is causing this… Can anyone reliably repro this problem?
This makes me wonder whether there aren’t two things at play here. My OOM problem certainly seems to stem from another source. I use
gatsby cleanfor every step so it’s not something that’s stored in.cacheorpublic. Maybe Mongo does something differently. Restarting it did not help.Regenerating the db with 100 pages makes the run finish fine, in 10s. Not unexpected but good to see that still works. I checked into why the queries run so slow. Turns out they are actually under reporting their work; each query is running remark for every post on the page. By default there are 30 pages on each page so remark is called 30 times for that fact alone, but it visually counts as one step for the query.
If I go into gatsby-transformer-remark and add
return 'foo'in theresolve(markdownNode, { format, pruneLength, truncate }) {function, then the queries speed up drastically and the 100k site completes within reasonable time (~4.5 minutes). (This does expose a scaling issue as the small site runs to completion much faster than a big site, but that’s a different issue to deal with later). And with this hack the assert is still proced so that’s a decent repro. And allows us to skip a decent chunk of code 😃Then I confirmed whether this problem pops up with JSON.stringify; Change the v8 bits in
persist.tsto use json instead. No assert crash.Now this isn’t necessarily a surprise.
v8.serializeis more efficient because of a few reasons, but it probably also serializes more data than JSON.stringify, which is not designed to serialize functions or class instances. Let’s take a look at the output for a small number of pages;I compared the v8.serialize to the old way of doing json.stringify (using https://github.com/stefanprobst/gatsby/blob/c043816915c0e4b632730091c1d14df08d6249d4/packages/gatsby/src/redux/persist.js as a reference point). Both ways dump about 500k of data.
Before I “checked” the assert with a simple
JSON.stringify. However, I need to run this again since the original way of stringifying was capturing a lot more. Running it again also showswarn Error persisting state: Invalid string length, but it’s non-fatal and teh build completes. Probably not saving the redux state, though 😉Next I checked whether I could catch the assertion error. This is more difficult because the error happens somewhere inside the serialzation call and we don’t control the assert. It doesn’t appear to be
try-catchable… 😦 Arguably this is a bug in nodejs as the assert should instead throw a catchable error (like happens withJSON.stringify), but I doubt anyone cares to change that.The way forward
So after some discussion we’re going to try to chunk the pages nodes. Ultimately we’re hitting the arbitrary buffer limit and there’s no easy way to fix that. So instead we’ll first serialize the redux state without the page nodes. Then we’ll try to apply some heuristics to chunk the page nodes such that they stay well below this limit. This means the redux state has to be serialized across multiple files, but that should also means that it won’t fatal.
We’re gonna be a little busy the next week but watch this space.
I have tested this issue on
gatsby 2.17.7andnode 13.0.1. Issue still persists.Updated the repro repo.
Currently facing this issue with a production site running on gatsby.
Awesome @ganapativs thanks for creating this repro.
@sidharthachatterjee As discussed the other day, created a repro here. Please look into it 👍