triton: TypeError: function takes exactly 16 arguments (13 given)

(Issue was first posted in torchdynamo, but I’m reposting it here, since it seems like it is potentially an issue with triton instead.)

The following seems to throws TypeError: function takes exactly 16 arguments (13 given) no matter what I do. I’ve reproduced it several times now.

import torch
from torch import tensor, device
import torch.fx as fx
from torchdynamo.testing import rand_strided
from math import inf
from torch.fx.experimental.proxy_tensor import make_fx

# torch version: 1.14.0.dev20221009
# torch cuda version: 11.7
# torch git version: 0dbefb2414417e80371ef3d8224404d4a522f86e


# CUDA Info:
# nvcc: NVIDIA (R) Cuda compiler driver
# Copyright (c) 2005-2022 NVIDIA Corporation
# Built on Wed_Jun__8_16:49:14_PDT_2022
# Cuda compilation tools, release 11.7, V11.7.99
# Build cuda_11.7.r11.7/compiler.31442593_0

# GPU Hardware Info:
# NVIDIA A100-SXM4-40GB : 1


from torch.nn import *
class Repro(torch.nn.Module):
    def __init__(self):
        super().__init__()



    def forward(self, arg0_1, new_zeros_1):
        slice_scatter = torch.ops.aten.slice_scatter.default(new_zeros_1, arg0_1, 2, 0, 2048);  new_zeros_1 = arg0_1 = None
        return (slice_scatter,)

args = [((16, 128, 2048), (262144, 2048, 1), torch.float32, 'cuda'), ((16, 128, 2112), (270336, 2112, 1), torch.float32, 'cuda')]
args = [rand_strided(sh, st, dt, dev) for (sh, st, dt, dev) in args]
mod = make_fx(Repro())(*args)

from torchinductor.compile_fx import compile_fx_inner
from torchdynamo.debug_utils import same_two_models

compiled = compile_fx_inner(mod, args)
compiled(*args)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (2 by maintainers)

Most upvoted comments

You are correct! The cubin cache uses it via fn.cache_key https://github.com/openai/triton/blob/09cc2d454b442301e88d1df153214732bd8714d8/python/triton/compiler.py#L1239 but the entry point cache doesn’t. This must be the bug.

Cool! I will close this issue then 😃 But feel free to re-open if you run into another issue.

@desertfire I think the issues are parallel. The problem you are trying to fix is related with new arguments we added to the c_wrapper function; while the problem with cache is the the version key has not been adopted in make_fn_cache_key.

FYI, the PR for fixing the caching is https://github.com/openai/triton/pull/765.

@jansel I believe cleaning up the cache dir (i.e., $HOME/.triton/cache) should work.

Aah this all makes sense. We added some arguments recently to allow users to pass hooks to c_wrapper for e.g. profiling. Doing this in the C code allows us to have no overhead at all when the hooks aren’t set. There are three new arguments here: https://github.com/openai/triton/blob/master/python/triton/compiler.py#L1308 that I believe you can set to None if you don’t use the hooks.

T_T I’ll take a look this week. Probably still an issue with the frontend and constexprs…