triton: TypeError: function takes exactly 16 arguments (13 given)

(Issue was first posted in torchdynamo, but I’m reposting it here, since it seems like it is potentially an issue with triton instead.)

The following seems to throws TypeError: function takes exactly 16 arguments (13 given) no matter what I do. I’ve reproduced it several times now.

import torch
from torch import tensor, device
import torch.fx as fx
from torchdynamo.testing import rand_strided
from math import inf
from torch.fx.experimental.proxy_tensor import make_fx

# torch version: 1.14.0.dev20221009
# torch cuda version: 11.7
# torch git version: 0dbefb2414417e80371ef3d8224404d4a522f86e


# CUDA Info:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2022 NVIDIA Corporation
# Built on Wed_Jun__8_16:49:14_PDT_2022
# Cuda compilation tools, release 11.7, V11.7.99
# Build cuda_11.7.r11.7/compiler.31442593_0

# GPU Hardware Info:
# NVIDIA A100-SXM4-40GB : 1


from torch.nn import *
class Repro(torch.nn.Module):
    def __init__(self):
        super().__init__()



    def forward(self, arg0_1, new_zeros_1):
        slice_scatter = torch.ops.aten.slice_scatter.default(new_zeros_1, arg0_1, 2, 0, 2048);  new_zeros_1 = arg0_1 = None
        return (slice_scatter,)

args = [((16, 128, 2048), (262144, 2048, 1), torch.float32, 'cuda'), ((16, 128, 2112), (270336, 2112, 1), torch.float32, 'cuda')]
args = [rand_strided(sh, st, dt, dev) for (sh, st, dt, dev) in args]
mod = make_fx(Repro())(*args)

from torchinductor.compile_fx import compile_fx_inner
from torchdynamo.debug_utils import same_two_models

compiled = compile_fx_inner(mod, args)
compiled(*args)

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 18 (2 by maintainers)

Most upvoted comments

You are correct! The cubin cache uses it via fn.cache_key https://github.com/openai/triton/blob/09cc2d454b442301e88d1df153214732bd8714d8/python/triton/compiler.py#L1239 but the entry point cache doesn’t. This must be the bug.

ptillet on Oct 11, 2022

Cool! I will close this issue then 😃 But feel free to re-open if you run into another issue.

ptillet on Oct 11, 2022

@desertfire I think the issues are parallel. The problem you are trying to fix is related with new arguments we added to the c_wrapper function; while the problem with cache is the the version key has not been adopted in make_fn_cache_key.

FYI, the PR for fixing the caching is https://github.com/openai/triton/pull/765.

Jokeren on Oct 11, 2022

@jansel I believe cleaning up the cache dir (i.e., $HOME/.triton/cache) should work.

Jokeren on Oct 11, 2022

Aah this all makes sense. We added some arguments recently to allow users to pass hooks to c_wrapper for e.g. profiling. Doing this in the C code allows us to have no overhead at all when the hooks aren’t set. There are three new arguments here: https://github.com/openai/triton/blob/master/python/triton/compiler.py#L1308 that I believe you can set to None if you don’t use the hooks.

ptillet on Oct 11, 2022

T_T I’ll take a look this week. Probably still an issue with the frontend and constexprs…

ptillet on Oct 11, 2022