jsdom: querySelectorAll() with attribute selector which tests value is slow for URI-based values
This is more specific issue instead of #677.
- Run this script to get a test page:
'use strict';
const fs = require('fs');
const html = fs.openSync('test.html', 'w');
fs.writeSync(html,
'\uFEFF<!doctype html><html><head><meta charset="UTF-8"><title></title></head><body>\n\n',
null, 'utf8');
let i = 100000;
while (i--) {
fs.writeSync(html, `<a title='${i}.html' href='${i}.html'></a>\n`, null, 'utf8');
}
fs.writeSync(html, '\n</body></html>\n', null, 'utf8');
You will get 100000 a
elements with title
and href
attributes with identical values.
- Run these four scripts one by one. Do not combine all the four tests in one script because of memoization in the
querySelectorAll()
which affects performance of consecutive calls. Compare the outputs.
require('jsdom').env({file: 'test.html', done: (err, window) => {
let hrStart;
let hrEnd;
hrStart = process.hrtime();
console.log(window.document.querySelectorAll('a[title]').length);
hrEnd = process.hrtime(hrStart);
console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.318559525 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
let hrStart;
let hrEnd;
hrStart = process.hrtime();
console.log(window.document.querySelectorAll('a[title*=".html"]').length);
hrEnd = process.hrtime(hrStart);
console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.559967377 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
let hrStart;
let hrEnd;
hrStart = process.hrtime();
console.log(window.document.querySelectorAll('a[href]').length);
hrEnd = process.hrtime(hrStart);
console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
0.34831147 s
require('jsdom').env({file: 'test.html', done: (err, window) => {
let hrStart;
let hrEnd;
hrStart = process.hrtime();
console.log(window.document.querySelectorAll('a[href*=".html"]').length);
hrEnd = process.hrtime(hrStart);
console.log(`${(hrEnd[0] * 1e9 + hrEnd[1]) / 1e9} s\n`);
}});
100000
9.444107183 s
As you can see, the cause of slowness is not the value testing by itself, but the testing of the href
value.
- Run the second and the fourth scripts with profiling (see this guide). You can see this hot spot in the
--prof-process
output for the fourth script (it is missing in the output for second script):
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
10422 57.9% C:\Program Files\nodejs\node.exe
8311 79.7% C:\Program Files\nodejs\node.exe
2868 34.5% LazyCompile: *URLStateMachine ...\node_modules\whatwg-url\lib\url-state-machine.js:423:25
2861 99.8% LazyCompile: *exports.resolveURLToResultingParsedURL ...\node_modules\jsdom\lib\jsdom\living\helpers\document-base-url.js:51:42
2861 100.0% LazyCompile: setTheURL ...\node_modules\jsdom\lib\jsdom\living\nodes\HTMLHyperlinkElementUtils-impl.js:261:19
2859 99.9% LazyCompile: *getAttribute ...\node_modules\nwmatcher\src\nwmatcher-noqsa.js:321:13
In the getAttribute
function of the nwmatcher-noqsa.js
attributes are dealing with differently if they match the ATTR_URIDATA
list. It seems to be the turning point. However I can’t trace all the parsing chain to URLStateMachine
because I lack for understanding of all the jsdom complexity and components connections.
Is there any possible reason for this slowing-down additional parsing calls chain in the mere string match testing?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 22 (21 by maintainers)
Yeah, that sounds good to me 😃. Thanks very much for testing!!
The issue should have been fixed by this commit:
https://github.com/dperini/nwmatcher/commit/1e30d83
please confirm and close.
I don’t think there’s any spec-compliance issue here.
Currently we do URL parsing lazily, without caching, for things like the
href
property. But I’m not sure why that should impact our querySelector performance, since querySelector only cares about attribute values, not about resolved hrefs. Maybe nwmatcher is incorrectly usingelement.href
?The profile above seems to show nwmatcher’s getAttribute calling jsdom’s internal setTheURL, which doesn’t make any sense to me at all. nwmatcher shouldn’t have any access to jsdom internals.
Huh, then we’re probably not caching the URL resolve results correctly. Need to check if we have to roll our own there or it’s a bug in the implementation if we’re not following the spec correctly.