jsdom: querySelectorAll() with attribute selector which tests value is slow for URI-based values

This is more specific issue instead of #677.

  1. Run this script to get a test page:
'use strict';

const fs = require('fs');

const html =  fs.openSync('test.html', 'w');

fs.writeSync(html,
  '\uFEFF<!doctype html><html><head><meta charset="UTF-8"><title></title></head><body>\n\n',
  null, 'utf8');

let i = 100000;
while (i--) {
  fs.writeSync(html, `<a title='${i}.html' href='${i}.html'></a>\n`, null, 'utf8');
}

fs.writeSync(html, '\n</body></html>\n', null, 'utf8');

You will get 100000 a elements with title and href attributes with identical values.

  1. Run these four scripts one by one. Do not combine all the four tests in one script because of memoization in the querySelectorAll() which affects performance of consecutive calls. Compare the outputs.
require('jsdom').env({file: 'test.html', done: (err, window) => {
  let hrStart;
  let hrEnd;

  hrStart = process.hrtime();
  console.log(window.document.querySelectorAll('a[title]').length);
  hrEnd = process.hrtime(hrStart);
console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.318559525 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
  let hrStart;
  let hrEnd;

  hrStart = process.hrtime();
  console.log(window.document.querySelectorAll('a[title*=".html"]').length);
  hrEnd = process.hrtime(hrStart);
  console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.559967377 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
  let hrStart;
  let hrEnd;

  hrStart = process.hrtime();
  console.log(window.document.querySelectorAll('a[href]').length);
  hrEnd = process.hrtime(hrStart);
  console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.34831147 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
  let hrStart;
  let hrEnd;

  hrStart = process.hrtime();
  console.log(window.document.querySelectorAll('a[href*=".html"]').length);
  hrEnd = process.hrtime(hrStart);
  console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
9.444107183 s

As you can see, the cause of slowness is not the value testing by itself, but the testing of the href value.

  1. Run the second and the fourth scripts with profiling (see this guide). You can see this hot spot in the --prof-process output for the fourth script (it is missing in the output for second script):
 [Bottom up (heavy) profile]:
  Note: percentage shows a share of a particular caller in the total
  amount of its parent calls.
  Callers occupying less than 2.0% are not shown.

   ticks parent  name
  10422   57.9%  C:\Program Files\nodejs\node.exe
   8311   79.7%    C:\Program Files\nodejs\node.exe
   2868   34.5%      LazyCompile: *URLStateMachine ...\node_modules\whatwg-url\lib\url-state-machine.js:423:25
   2861   99.8%        LazyCompile: *exports.resolveURLToResultingParsedURL ...\node_modules\jsdom\lib\jsdom\living\helpers\document-base-url.js:51:42
   2861  100.0%          LazyCompile: setTheURL ...\node_modules\jsdom\lib\jsdom\living\nodes\HTMLHyperlinkElementUtils-impl.js:261:19
   2859   99.9%            LazyCompile: *getAttribute ...\node_modules\nwmatcher\src\nwmatcher-noqsa.js:321:13

In the getAttribute function of the nwmatcher-noqsa.js attributes are dealing with differently if they match the ATTR_URIDATA list. It seems to be the turning point. However I can’t trace all the parsing chain to URLStateMachine because I lack for understanding of all the jsdom complexity and components connections.

Is there any possible reason for this slowing-down additional parsing calls chain in the mere string match testing?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 22 (21 by maintainers)

Commits related to this issue

Most upvoted comments

Yeah, that sounds good to me 😃. Thanks very much for testing!!

The issue should have been fixed by this commit:

https://github.com/dperini/nwmatcher/commit/1e30d83

please confirm and close.

I don’t think there’s any spec-compliance issue here.

Currently we do URL parsing lazily, without caching, for things like the href property. But I’m not sure why that should impact our querySelector performance, since querySelector only cares about attribute values, not about resolved hrefs. Maybe nwmatcher is incorrectly using element.href?

The profile above seems to show nwmatcher’s getAttribute calling jsdom’s internal setTheURL, which doesn’t make any sense to me at all. nwmatcher shouldn’t have any access to jsdom internals.

Huh, then we’re probably not caching the URL resolve results correctly. Need to check if we have to roll our own there or it’s a bug in the implementation if we’re not following the spec correctly.