tinygrad: Metal gives Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file"

Hi, I am going through the tinygrad documentation and trying to run the basic Tensor example on abstractions.py

Environment:

  • M1 Mac, Sonoma 14.0,
  • Python 3.10.13
  • tinygrad source install Nov 6th

try_tensor.py:

from tinygrad.tensor import Tensor
a = Tensor([2])
b = Tensor([3])
result = a + b
print(f"{a.numpy()} + {b.numpy()} = {result.numpy()}")

However I am getting an error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tinygrad/tinygrad/tensor.py", line 126, in numpy
    return self.detach().cast(dtypes.from_np(self.dtype.np)).contiguous().to('CPU').realize().lazydata.realized.toCPU().reshape(self.shape)
  File "/tinygrad/tinygrad/tensor.py", line 105, in realize
    run_schedule(self.lazydata.schedule())
  File "/tinygrad/tinygrad/realize.py", line 27, in run_schedule
    si.out.realized = Device[si.out.device].exec_ast(si.ast, output=si.out, inputs=si.inputs, var_vals=si.var_vals, **si.out._device_extra_args())
  File "/tinygrad/tinygrad/ops.py", line 292, in exec_ast
    if ast not in self.method_cache: self.method_cache[ast] = get_program()
  File "/tinygrad/tinygrad/ops.py", line 289, in get_program
    return self.to_program(k)
  File "/tinygrad/tinygrad/ops.py", line 239, in to_program
    display_name=k.display_name, runtime_args=runtime_args).build(self.compiler, self.runtime)
  File "/tinygrad/tinygrad/ops.py", line 193, in build
    self.lib = compiler.__wrapped__(self.prg) if getenv("DISABLE_COMPILER_CACHE") else compiler(self.prg)
  File "/tinygrad/tinygrad/helpers.py", line 206, in wrapper
    return diskcache_put(table, key, func(*args, **kwargs))
  File "/tinygrad/tinygrad/runtime/ops_metal.py", line 51, in compile_metal
    unwrap(library.serializeToURL_error_(Cocoa.NSURL.URLWithString_(f"file://{output_file.name}"), None))
  File "/tinygrad/tinygrad/runtime/ops_metal.py", line 38, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.}

This was the Metal program generated:

#include <metal_stdlib>
using namespace metal;
kernel void E_(device float* data0, 
               const device float* data1, 
               const device float* data2, 
               uint3 gid [[threadgroup_position_in_grid]], 
               uint3 lid [[thread_position_in_threadgroup]]) {
    float val0 = *(data1+0);
    float val1 = *(data2+0);
    *(data0+0) = (val0+val1);
}

Compile options:

<MTLCompileOptionsInternal: 0x10b7eea90>
    preprocessorMacros:  
    fastMathEnabled = 1 
    framebufferReadEnabled = 0 
    preserveInvariance = 0 
    optimizationLevel = MTLLibraryOptimizationLevelDefault 
    libraryType = MTLLibraryTypeExecutable 
    installName = <null> 
    compileSymbolVisibility =  0 
    allowReferencingUndefinedSymbols =  0 
    maxTotalThreadsPerThreadgroup =  0 
    languageVersion = default

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Reactions: 7
  • Comments: 46 (10 by maintainers)

Most upvoted comments

For all facing this issue -> just run the existing master with these two env variables “METAL_XCODE=1 DISABLE_COMPILER_CACHE=1”. It passes all the tests for m1 macos sonoma 14.x.x

Essentialy there is some issue with when you use the cached metal library in M1 air macos14.x.x atleast. With DISABLE_COMPILER_CACHE=1 in line 254 in device.py we call the wrapped attr… so basically without the disckcache added functionality and it re-compiles everytime which makes it work.

Got the hint from #2372 and the latest discussion there. this way u dont even need to do any change in the code.

#2372 resolves this issue for me!

I have the same error on macOS 13.4.1 trying to run the mnist example.

Traceback (most recent call last):
  File "/Users/leif-hancox-li/tinygrad/examples/mnist_gan.py", line 98, in <module>
    loss_d += train_discriminator(optim_d, data_real, data_fake)
  File "/Users/leif-hancox-li/tinygrad/examples/mnist_gan.py", line 61, in train_discriminator
    optimizer.step()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/nn/optim.py", line 52, in step
    self.t.assign(self.t + 1).realize()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/tensor.py", line 105, in realize
    run_schedule(self.lazydata.schedule())
  File "/Users/leif-hancox-li/tinygrad/tinygrad/realize.py", line 27, in run_schedule
    si.out.realized = Device[si.out.device].exec_ast(si.ast, output=si.out, inputs=si.inputs, var_vals=si.var_vals, **si.out._device_extra_args())
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 292, in exec_ast
    if ast not in self.method_cache: self.method_cache[ast] = get_program()
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 289, in get_program
    return self.to_program(k)
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 237, in to_program
    return ASTRunner(k.function_name, src, k.global_size, k.local_size,
  File "/Users/leif-hancox-li/tinygrad/tinygrad/ops.py", line 193, in build
    self.lib = compiler.__wrapped__(self.prg) if getenv("DISABLE_COMPILER_CACHE") else compiler(self.prg)
  File "/Users/leif-hancox-li/tinygrad/tinygrad/helpers.py", line 216, in wrapper
    return diskcache_put(table, key, func(*args, **kwargs))
  File "/Users/leif-hancox-li/tinygrad/tinygrad/runtime/ops_metal.py", line 51, in compile_metal
    unwrap(library.serializeToURL_error_(Cocoa.NSURL.URLWithString_(f"file://{output_file.name}"), None))
  File "/Users/leif-hancox-li/tinygrad/tinygrad/runtime/ops_metal.py", line 38, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "MTLLibrary is not formatted as a MetalLib file." UserInfo={NSLocalizedDescription=MTLLibrary is not formatted as a MetalLib file.}

image same shader compiled with same script found above on python 3.12 (left) and 3.10 on the right. Both the beginning and end of the compiled shader are wrong on python <3.12

@dattienle2573 from tinygrad import Device Device.DEFAULT = “GPU”

Got this error on my conda python 3.12. So can’t say its working on 3.12 entirely.

Maybe I wasn’t clear enough, all conda python versions I’ve tried had made this bug occur.

Got this error on my conda python 3.12. So can’t say its working on 3.12 entirely.

@Leikoe great job investigating this!

Piggybacking from @SamRaymond 's work.

I got to this issue after trying to run the hello word matmul example in the README

$ METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
*** METAL   rand  seed 1703711284 size 1048576         dtype dtypes.float
*** METAL   rand  seed 1703711285 size 1048576         dtype dtypes.float
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM (1024, 1024, 1)
  2    ┗━┳ MUL
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
TENSOR CORES [(1, 1024, 1)] [(0, 1024, 1024)] tensor_core<METAL, [8, 8, 8], dtypes.float, dtypes.float>
3 alias 1: idxs= [Variable('gidx0', 0, 31), NumNode(0), NumNode(0), Variable('lidx3', 0, 1), NumNode(0), ((Variable('lidx4', 0, 15)//2)%4), NumNode(0), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)%2)*2)+((Variable('lidx4', 0, 15)//8)*4)+Variable('None', 0, 1)), NumNode(0), Variable('None', 0, 3), NumNode(0)]
4 alias 2: idxs= [NumNode(0), Variable('gidx1', 0, 7), Variable('lidx2', 0, 3), NumNode(0), (Variable('lidx4', 0, 15)//8), NumNode(0), (Variable('lidx4', 0, 15)%2), Variable('ridx7', 0, 127), (((Variable('lidx4', 0, 15)//2)%4)+(Variable('lidx3', 0, 1)*4)), Variable('None', 0, 1), NumNode(0), Variable('None', 0, 3)]
Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 131, in numpy
    return self.cast(self.dtype.scalar()).contiguous().realize().lazydata.base.realized.toCPU().astype(self.dtype.np, copy=True).reshape(self.shape)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/tensor.py", line 101, in realize
    run_schedule(self.lazydata.schedule())
  File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 28, in run_schedule
    prg = lower_schedule_item(si)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/realize.py", line 21, in lower_schedule_item
    return Device[si.out.device].get_runner(si.ast)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 314, in get_runner
    def get_runner(self, ast:LazyOp) -> CompiledASTRunner: return self.to_program(self.get_linearizer(ast))
                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 286, in to_program
    return CompiledASTRunner(k.ast, k.name, src, k.global_size, k.local_size, runtime_args).build(self.compiler, self.runtime)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/device.py", line 255, in build
    self.clprg = runtime(self.name, self.lib)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/runtime/ops_metal.py", line 30, in __init__
    self.library = unwrap2(self.device.device.newLibraryWithData_error_(data, None))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/clay/git/forks/tinygrad/tinygrad/helpers.py", line 46, in unwrap2
    assert err is None, str(err)
           ^^^^^^^^^^^
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file}

with GPU=1

GPU=1 METAL_XCODE=1 DEBUG=3 python -c "from tinygrad import Tensor;N = 1024; a, b = Tensor.rand(N, N), Tensor.rand(N, N);c = (a.reshape(N, 1, N) * b.T.reshape(1, N, N)).sum(axis=2);print((c.numpy() - (a.numpy() @ b.numpy())).mean())"
CLDevice: got 1 platforms and 1 devices
*** GPU   rand  seed 1703711410 size 1048576         dtype dtypes.float
*** GPU   rand  seed 1703711411 size 1048576         dtype dtypes.float
  0 ━┳ STORE MemBuffer(idx=0, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1), strides=(1024, 1, 0), offset=0, mask=None, contiguous=True),)))
  1  ┗━┳ SUM (1024, 1024, 1)
  2    ┗━┳ MUL
  3      ┣━━ LOAD MemBuffer(idx=1, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(1024, 0, 1), offset=0, mask=None, contiguous=False),)))
  4      ┗━━ LOAD MemBuffer(idx=2, dtype=dtypes.float, st=ShapeTracker(views=(View(shape=(1024, 1024, 1024), strides=(0, 1, 1024), offset=0, mask=None, contiguous=False),)))
*** GPU        1 r_32_16_8_16_256_4_4_4                arg   3 mem  0.01 GB tm   7579.38us/     7.58ms (  283.33 GFLOPS,    1.66 GB/s)
2.5320332e-09
avg:   283.33 GFLOPS     1.66 GB/s           total:     1 kernels     2.15 GOPS     0.01 GB     7.58 ms

On Sonoma 14.2.1, MacBook Air M1 2020

This took a while… My M2 Air works like a charm, Metal was in the site-packages, for my M1 Pro 16, Metal wasn’t in the packages If I run METAL_XCODE=1 python docs/abstractions2.py with DEVICE = “GPU” this works for me, same for other files.

Running on M1 Macbook Pro 16 inch 2021 (14.2.1 (23C71))

MTLDynamicLibrary isn’t a viable solution, it isn’t supported on all devices. Macs with Intel or AMD gpus will suffer.

I am still having this problem - M1 MacBook Air (2020) with macOS 14.1.1. Interestingly with #2369 I get a different problem:

  File "/Users/alexb/Projects/tinygrad/tinygrad/ops.py", line 314, in to_program
    display_name=k.display_name, runtime_args=runtime_args).build(self.compiler, self.runtime)
  File "/Users/alexb/Projects/tinygrad/tinygrad/ops.py", line 280, in build
    self.clprg = runtime(self.name, self.lib)
  File "/Users/alexb/Projects/tinygrad/tinygrad/runtime/ops_metal.py", line 61, in __init__
    self.library = unwrap(METAL.device.newLibraryWithData_error_(data, None))
  File "/Users/alexb/Projects/tinygrad/tinygrad/runtime/ops_metal.py", line 45, in unwrap
    assert err is None, str(err)
AssertionError: Error Domain=MTLLibraryErrorDomain Code=1 "Invalid library file" UserInfo={NSLocalizedDescription=Invalid library file}

where line 61 is the call to newLibraryWithData. It would seem something is whacky with exporting the compiled shader library on certain Mseries Macs? This worked fine on the exact same machine w/ macOS 13.x.

I have macOS 13. Is 14 broken?