tinygrad: Some METAL CI tests failing

Track down why test_half_to_int8 and test_output_padded_conv_transpose2d fail in Metal CI.

Currently the whole dtype test is disabled in the CI config file, and test_output_padded_conv_transpose2d is skipped in test_ops.py Would be great to reenable them.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 31 (13 by maintainers)

Most upvoted comments

FWIW on my 16GB M2 Mac Mini (details in log) these failures don’t seem to reproduce. I have an M1 MacBook that just needs a re-image, I’m happy to fire it up if it’s likely to shake loose the issue.

Log: https://gist.github.com/b7r6/c2a0d228499fdeb3b7f885d1abf48994

as the current state of test_dtypes.py, test_int8_to_uint8_negative fails on Mac Intel x86

Tensor([-1,-2,-3,-4], dtype=dtypes.int8).cast(dtypes.uint8)

the relevant generated Metal kernel:

kernel void E_4(device unsigned char* data0, const device char* data1, uint3 gid [[threadgroup_position_in_grid]], uint3 lid [[thread_position_in_threadgroup]]) {
      float val0 = (float)(*(data1+0));
      float val1 = (float)(*(data1+1));
      float val2 = (float)(*(data1+2));
      float val3 = (float)(*(data1+3));
      *(data0+0) = (unsigned char)(val0);
      *(data0+1) = (unsigned char)(val1);
      *(data0+2) = (unsigned char)(val2);
      *(data0+3) = (unsigned char)(val3);
}

the kernel casts from fp32 -> uchar, which is an undefined behavior according the c++ spec (page 86 of N4296). the same code just happens to work on apple silicon, however, you can’t expect it to behave the same way across all machines.

now if you look at the generated kernel from torch you’ll see it casts from fp32 -> long -> uchar.

auto tmp0 = in_ptr0[static_cast<long>(i0)];
auto tmp1 = static_cast<unsigned char>(tmp0);
out_ptr0[static_cast<long>(i0)] = tmp1;

The solution is to have the right type casting logic.

I have tested the path in tinygrad from fp32 -> long -> uchar on Mac Intel and it successfully passes.

Note: MSL conforms to C++14 spec