deno: Text decoding performance abysmally slow.

This issue was originally focused on file reading performance but it was brought to my attention that the issue is in fact with the text decoding, and that the file read operation does not take very long. This issue does still directly impact reading large files as strings, though.

The following paragraph is from the original issue, and still holds true to an extent. I have updated the rest of the issue to talk about the text decoding, though.

The file reading performance of Deno is, well, abysmal. On my machine it takes 9.25 seconds to read a 25mb file using deno_std’s fs readFileStrSync function, and additionally, while it’s reading the file it pins CPU usage at 100%+. For comparison, perl5 reads the same file in 0.025 seconds.

I have updated the Repro code below to be a test of the text decoder speed. When you run the code, note the 100%+ CPU usage.

Repro code: Deno:

'use strict';

const decoder = new TextDecoder('utf-8');
const data = new Uint8Array(new Array(25e6).fill(65)); //25 mb of ASCII capital letter 'A";
console.log('Starting decode...');
const start = performance.now();
decoder.decode(data);
console.log('Decode finished in ' + ((performance.now() - start)/1000).toFixed(3) + 's');

Example output: Decode finished in 6.134s

There must be something wrong somewhere with the file reading text decoding for it to take so long (and to pin the CPU usage at 100%+ while it’s at it). Until this is fixed I can’t really continue working on my project in Deno which involves saving and loading significant amounts of data, because as it is, with ~30mb of data between two files it’s taking 10+ seconds to load.

% deno -v
deno: 0.21.0
v8: 7.9.304
typescript: 3.6.3

macOS Catalina 10.15 Beta

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 19 (19 by maintainers)

Commits related to this issue

Most upvoted comments

@ry feels like we should also invert it to… keep track of TextEncoder performance as well.

If we could get a benchmark on the benchmarks page, I will happily invest the time to make it faster / use less memory.

It’s not at all obvious that doing it in rust would be faster. Boundary crossing is a significant cost and V8 optimizes code very well.

700MB of memory to decode 25MB suggest there’s a problem in our decoder implementation.

Indeed. Maybe it’s a just a small fix to the JS.