performance: readFile in promises very slow

Version

v21.7.1

Platform

Darwin evgeniis.local 23.4.0 Darwin Kernel Version 23.4.0: Wed Feb 21 21:44:43 PST 2024; root:xnu-10063.101.15~2/RELEASE_ARM64_T6000 arm64

Subsystem

No response

What steps will reproduce the bug?

slow

const fs = require('fs/promises');

const start = Date.now();
let count = 0;
for (let i = 0; i < 10000; i++) {
	fs.readFile("./text.txt", { encoding: 'utf-8' })
		.then((data) => {
			if (data !== "Hello, world") throw 1;
			count++
			if (count === 10000) {
				console.log('time: ', Date.now() - start);
			}
		})
		.catch((err) => {
			throw 1
		})
}

fast

const fs = require('fs');
const util = require('util');
const readFile = util.promisify(fs.readFile);

const start = Date.now();
let count = 0;
for (let i = 0; i < 10000; i++) {
	readFile("./text.txt", { encoding: 'utf-8' })
		.then((data) => {
			if (data !== "Hello, world") throw 1;
			count++
			if (count === 10000) {
				console.log('time: ', Date.now() - start);
			}
		})
		.catch((err) => {
			throw 1
		})
}

How often does it reproduce? Is there a required condition?

No response

What is the expected behavior? Why is that the expected behavior?

No response

What do you see instead?

The promise version is 2 times slower. My tests showed 200ms vs 100ms

Additional information

No response

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Reactions: 2
  • Comments: 25 (19 by maintainers)

Most upvoted comments

You can basically recover the performance by calling readFileSync within an async function

That’s going to depend on the size of your file and if you’re program has other things to do – like @meyfa said, using readFileSync might get you the content of the file slightly faster, but the tradeoff is that your program cannot do anything in the mean time.

I think we are getting a bit off topic here… I would recommend all mentions of readFileSync should be hidden as off topic.

You can basically recover the performance by calling readFileSync within an async function…

const fs = require('fs');
const readFileSync = fs.readFileSync;
async function f(name, options) {
  return readFileSync(name, options);
}

const start = Date.now();
let count = 0;
for (let i = 0; i < 10000; i++) {
        f("./text.txt", { encoding: 'utf-8' })
                .then((data) => {
                        count++
                        if (count === 10000) {
                                console.log('time: ', Date.now() - start);
                        }
                })
                .catch((err) => {
                        throw 1
                })
}

I suggest to transfer this issue to the performance team repo… what do you think @anonrig?

@Linkgoron @aduh95 you are right, this optimization was already done.

The problem remains because there are a lot of C++/JS transitions for large files, and the utf8 parsing still allocates double the memory.

Maybe that’s a wrong assumption from my side, but I think it’s not possible to know the size of a file ahead of time. So my point was the current buffer is already a “big chunk to fit the file”, but that’s only true for files smaller than 64kiB – 64K is arbitrary, but unless we can know the size of the file, we will inevitably need some concatenation given a large enough file. Anyway, optimising the UTF-8 case certainly makes sense.

Maybe I’m misunderstanding what you’re saying, but today the code uses fstat in-order to allocate the returned buffer. If it fails, it allocates 64kIb chunks for every read and then concats them in the end (note that even if fstat succeeds, Node still reads in 512 kib chunks into the larger buffer).

Anyhow I think it’s more important to optimize the utf8 case.

readFileSync has to return the file contents synchronously, so it must read the whole file before returning. And since JS isn’t multi-threaded this will block execution. fs.promises.readFile can allow other stuff to run while it fetches the file asynchronously, fulfilling the returned promise when done. The presence of await or the fact that the readFileSync call happens in an async function does not affect its internal behavior.

While readFileSync may speed up the benchmark, I think it is unsuitable for applications that could be doing other work while waiting for I/O to complete.

You can basically recover the performance by calling readFileSync within an async function

That cannot be optimal? Maybe in this synthetic example it is, but won’t readFileSync just block the whole program until the entire file is read? The await in this case just gives the illusion of being async as a microtick is introduced after the entire file has been read synchronously…