tfjs: running some models using webgl is 10x slower than using nodejs

Model executes in NodeJS using tensorflow backend in ~100ms, but above 1sec in browser using WebGL backend That is 10x difference in performance between NodeJS and WebGL

I’ve tried to figure out why a model is executing an order of magnitude slower than expected using WebGL backend,
but profiler info for it makes no sense

const t0 = performance.now();
const res = await tf.profile(() => model.executeAsync(input));
const t1 = performance.now();
const wallTime = t1 - t0;
const kernelTime = res.kernels.reduce((a, b) => a += b.kernelTimeMs, 0);

wallTime is 900-1200ms
kernelTime is ~20ms

I’m re-running inference on the same input twice and looking at second run to allow for warmup time of WebGL backend (shader compile, etc.)

I even tried tf.enableDebugMode() and I still don’t see anything that gets even close to overall wall time

And I have no idea where is time spent?

Model in question: https://github.com/vladmandic/nanodet

Environment: TFJS 3.3.0 on Ubuntu 20.10 and Chrome 89

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 16 (8 by maintainers)

Most upvoted comments

inference time is simple:

in browser:

const t0 = performance.now();
const res = await model.executeAsync(input));
const t1 = performance.now();
const inferenceTime = Math.round(t1 - t0); // elapsed time in ms

and in nodejs:

const t0 = process.hrtime.bigint();
const res = await model.executeAsync(input)); // also can be model.predict(input);
const t1 = process.hrtime.bigint();
const inferenceTime = Math.round((t1 - t0) / 1000 /1000); // to convert to ms

and i’m only measuring subsequent runs, skipping first run so webgl has time to warmup (compile&upload shaders)

for most models, webgl is fast and it’s the best option (since node-gpu doesn’t work due to obsolete cuda dependency from tf1). however, for some models (i’ve provided examples), webgl is 10x slower than nodejs. but running profile() shows nothing useful.

vladmandic on Mar 29, 2021