tfjs: Mismatch in packed depthwise conv 2d results on Mali GPU
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow.js): Yes, test case shared below
- Mobile device : Pixel 6 Pro (Reproduces on any android device with Mali GPU that I tried)
- TensorFlow.js installed from (npm or script link): 3.19.0
- Browser version: Chrome 103.0.5060.53
Describe the current behavior
Packed depthwise conv2d produces incorrect result on Mali GPUs when WEBGL_MAX_TEXTURE_SIZE is left at the default value (which is 4096 on most modern android devices). In one of our networks, we end up creating a 3672x1 sized texture for weights, which produces incorrect outputs (presumably some error in sampling the texture, but that is just a guess). Setting max texture size lower than 3672 fixes the issue.
I have attached sample code below to produce the issue (uses the same filter dims as the layer which caused inaccuracy in our original network). The code does the following:
- We first run packed depthwise conv with 4096 as the texture size (the default value of max texture size on all browsers I tried, the value is hardcoded for consistent results)
- Next, we re-run the same node with a max size of 2048.
- Finally, we set backend to cpu to get the reference output.
- With size 2048, the outputs match the reference, but with the default size, the outputs do not match.
Note: The mismatch occurs only on Mali GPUs with Android based on my tests. iOS, MacOS chrome, Androids with Adreno GPU all produce the correct result with default texture size (of 4096)
Standalone code to reproduce the issue
tf.ENV.set('WEBGL_PACK_DEPTHWISECONV', true)
let w = Array.from({length: 3 * 3 * 816}, () => Math.random())
let x = Array.from({length: 12 * 10 * 816}, () => Math.random())
let inputs = {
filter: tf.tensor(w, [3, 3, 816, 1]),
x: tf.tensor(x, [1, 12, 10, 816]),
strides: 1,
pad: [[0, 0], [1, 1], [1, 1], [0, 0]],
dataFormat: "channelsLast",
dilations: 1,
activation: 'relu'
};
tf.setBackend('webgl')
tf.ENV.set('WEBGL_MAX_TEXTURE_SIZE', 4096)
let out_4096 = tf.fused.depthwiseConv2d(inputs);
tf.ENV.set('WEBGL_MAX_TEXTURE_SIZE', 2048)
inputs.x = tf.tensor(x, [1, 12, 10, 816])
inputs.filter = tf.tensor(w, [3, 3, 816, 1])
let out_2048 = tf.fused.depthwiseConv2d(inputs);
tf.setBackend('cpu')
inputs.x = tf.tensor(x, [1, 12, 10, 816])
inputs.filter = tf.tensor(w, [3, 3, 816, 1])
let out_reference = tf.fused.depthwiseConv2d(inputs);
const doTensorsDiffer = function(t0, t1) {
return tf.any(tf.greater(tf.abs(tf.sub(t0, t1)), tf.scalar(1e-2))).dataSync()[0];
}
console.log("Default and 2048 differ? " + doTensorsDiffer(out_4096, out_2048));
console.log("Reference and 2048 differ? " + doTensorsDiffer(out_reference, out_2048));
console.log("Reference and 4096 differ? " + doTensorsDiffer(out_reference, out_4096));
Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 27
tf.ENV.set(‘WEBGL_PACK_DEPTHWISECONV’, true)
let w = Array.from({length: 3 * 3 * 816}, () => Math.random()) let x = Array.from({length: 12 * 10 * 816}, () => Math.random())
let inputs = { filter: tf.tensor(w, [3, 3, 816, 1]), x: tf.tensor(x, [1, 12, 10, 816]), strides: 1, pad: [[0, 0], [1, 1], [1, 1], [0, 0]], dataFormat: “channelsLast”, dilations: 1, activation: ‘relu’ };
tf.setBackend(‘webgl’) tf.ENV.set(‘WEBGL_MAX_TEXTURE_SIZE’, 4096) let out_4096 = tf.fused.depthwiseConv2d(inputs);
tf.ENV.set(‘WEBGL_MAX_TEXTURE_SIZE’, 2048) inputs.x = tf.tensor(x, [1, 12, 10, 816]) inputs.filter = tf.tensor(w, [3, 3, 816, 1]) let out_2048 = tf.fused.depthwiseConv2d(inputs);
tf.setBackend(‘cpu’) inputs.x = tf.tensor(x, [1, 12, 10, 816]) inputs.filter = tf.tensor(w, [3, 3, 816, 1]) let out_reference = tf.fused.depthwiseConv2d(inputs);
const doTensorsDiffer = function(t0, t1) { return tf.any(tf.greater(tf.abs(t0.sub(t1)), tf.scalar(1e-2))).dataSync()[0]; }
console.log("Default and 2048 differ? " + doTensorsDiffer(out_4096, out_2048)); console.log("Reference and 2048 differ? " + doTensorsDiffer(out_reference, out_2048)); console.log("Reference and 4096 differ? " + doTensorsDiffer(out_reference, out_4096));
Thanks for the fix @Linchenn!
@shanumantesc
Just merged the fix PR to our code base. You could try it by locally building or your could wait our tfjs-v3.21.0.
You could use either
tf.env().set('WEBGL_MAX_SIZE_FOR_NARROW_TEXTURE', 2048);ortf.env().set('WEBGL_AUTO_SQUARIFY_NARROW_TEXTURE_SHAPE', true);before running codes on Mali GPU.Apologies for my misunderstanding of how the vertices were set up. Great to hear that you have a feasible workaround!
Good catch, but I got the same correct results when I using the following fragment shader:
I set color as 3672, and the output is still
3672, 0.00027233114815317094, 0, 0@Linchenn really interesting analysis, thanks for digging into this! An interesting followup where I set width to 2 and height as 3672
In this case I get
So this is closer to what we would want. Although we are always off by 1 with NN sampling per my understanding and there is also one case where diff is still wrong
3666.3496503496503. As I increase the width beyond 2, I still see this pattern of3672.77057793345and3669.557305336833popping up, but it is better than width 1 🤔