snowflake-connector-nodejs: SNOW-750472 Out of memory issue

I am running into an out of memory issue when trying to stream rows from a table with 3 million rows. Does the snowflake stream support the highWaterMark and back pressure functionality of streams? It seems like it should take little memory to stream 10 rows at a time from the db.

const snowflake = require('snowflake-sdk');
const config = require('config');
const through2 = require('through2');

async function test() {
  var connection = snowflake.createConnection({
      account: config.snowflake.account,
      username: config.snowflake.username,
      password: config.snowflake.password
    }
  );

  const conn = await new Promise((resolve, reject) => {
    connection.connect((err, c) => (err ? reject(err) : resolve(c)));
  });

  var statement = conn.execute({
    sqlText: 'SELECT * FROM TABLE’
  });

  var stream = statement.streamRows();
  stream.pipe(through2.obj({highWaterMark: 5}, function (chunk, enc, next) {
    // console.log(chunk);
    next();
  }));
}

test();

Output

<--- Last few GCs --->

[5282:0x103801600]   119235 ms: Scavenge 1382.8 (1419.3) -> 1382.1 (1419.8) MB, 4.6 / 0.0 ms  (average mu = 0.167, current mu = 0.068) allocation failure 
[5282:0x103801600]   119244 ms: Scavenge 1383.2 (1419.8) -> 1382.5 (1420.3) MB, 4.3 / 0.0 ms  (average mu = 0.167, current mu = 0.068) allocation failure 
[5282:0x103801600]   120061 ms: Mark-sweep 1383.6 (1420.3) -> 1382.9 (1420.3) MB, 812.5 / 0.0 ms  (average mu = 0.100, current mu = 0.030) allocation failure scavenge might not succeed


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x23988115be3d]
    1: StubFrame [pc: 0x2398811134b0]
Security context: 0x3459e929e6e1 <JSObject>
    2: extractFromRow [0x345902014769] [/Users/s/git/riversjs/node_modules/snowflake-sdk/lib/connection/result/column.js:~709] [pc=0x23988116e1f3](this=0x345945be4eb9 <Column map = 0x34590b7387b1>,row=0x3459467bd621 <Object map = 0x3459b262f599>,context=0x3459f8ee0899 <Object map = 0x34590b74b3c9>,asString=0x3...

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
 1: 0x10003ae75 node::Abort() [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 2: 0x10003b07f node::OnFatalError(char const*, char const*) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 3: 0x1001a6a85 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]
 4: 0x100574422 v8::internal::Heap::FatalProcessOutOfMemory(char const*) [/Users/s/.nvm/versions/node/v10.12.0/bin/node]

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 1
Comments: 24 (2 by maintainers)

Commits related to this issue

SNOW-830950: Configure nyc - code coverage library and corresponding steps in jenkins steps #43 — committed to snowflakedb/snowflake-connector-nodejs by sfc-gh-pmotacki a year ago
SNOW-830950: Configure nyc - code coverage library and corresponding steps in jenkins steps #43 — committed to snowflakedb/snowflake-connector-nodejs by sfc-gh-pmotacki a year ago

Most upvoted comments

We actually encountered something similar several months ago. I did some pretty extensive testing at the time to figure out what the problem was, and it appears it was the chunk size that is being sent from snowflake. I don’t remember all of the details, but basically snowflake will give a list of all of the available chunks to download, so as you stream the rows, it will download each sequential chunk. The problem is that the chunk size (which is determined by snowflake as far as I can tell) seems to more or less double in size until it hits some threshold. So what’d happen is we’d end up with something like -

Chunk 1: 1mb
Chunk 2: 2mb
Chunk 3: 4mb
Chunk 4: 8mb

… and when the snowflake client is parsing these larger chunks, the node client eventually exhausts all of its memory. The way that we got around it was that we had to use result_scan and iterate over the original query with a smaller pagination size (in our case, batching by 1,000,000 rows didn’t generate chunks large enough to cause any out of memory errors).

With that said, I don’t know for certain if that’s what’s happening here, but the symptoms do seem to be very much related, so wanted to post my observations here.

thegrumpysnail on Sep 30, 2019

thank you for providing the snippet and the rest of the details ! i managed to reproduce the issue and realized that

in my above comment I forgot to include the necessary configuration on Connection. this is now included in the example.
still with streamResult set to true on the Connection, the execution still breaks with JavaScript heap out of memory

until I reverted #465 :

     ....
      //ret = betterEval("(" + rawColumnValue + ")");
      ret = eval("(" + rawColumnValue + ")");

So using the exact same setup for this reproduction and test data (which is around 9million rows from the above 3 in your example), even with or without streamResult set to true on the Connection, it works now and doesn’t break.

So I believe what you’re seeing now is more likely connected to issues brought in by #465 , for which we have multiple Issues open (#528 , #539)

sfc-gh-dszmolka on Jun 26, 2023