AdderNet: RuntimeError: _cdist_backward requires X2 to be contiguous

Hi, I am trying to train your addernet, but it returns me one runtime error, which I supposed attributes to .continuous() function or some other uncommon operations used in your adder.py.

Could you help to solve this issue?

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18

Most upvoted comments

@ranery I saw there are two cdist() implementations online (code 1 , code 2)

def fast_cdist(x1, x2):
    adjustment = x1.mean(-2, keepdim=True)
    x1 = x1 - adjustment
    x2 = x2 - adjustment  # x1 and x2 should be identical in all dims except -2 at this point

    # Compute squared distance matrix using quadratic expansion
    # But be clever and do it with a single matmul call
    x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
    x1_pad = torch.ones_like(x1_norm)
    x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
    x2_pad = torch.ones_like(x2_norm)
    x1_ = torch.cat([-2. * x1, x1_norm, x1_pad], dim=-1)
    x2_ = torch.cat([x2, x2_pad, x2_norm], dim=-1)
    res = x1_.matmul(x2_.transpose(-2, -1))

    # Zero out negative values
    res.clamp_min_(1e-30).sqrt_()
    return res
import time
import torch


@torch.jit.script
def my_cdist(x1, x2):
    x1_norm = x1.pow(2).sum(dim=-1, keepdim=True)
    x2_norm = x2.pow(2).sum(dim=-1, keepdim=True)
    res = torch.addmm(x2_norm.transpose(-2, -1), x1, x2.transpose(-2, -1), alpha=-2).add_(x1_norm)
    res = res.clamp_min_(1e-30).sqrt_()
    return res


a = torch.randn(10000, 9).cuda()
b = torch.randn(30000, 9).cuda()

for i in range(5):
    start_time = time.time()
    res = torch.cdist(a, b)
    torch.cuda.synchronize()
    print(f'torch cdist time {i}: {time.time() - start_time:.2f}s')

for i in range(5):
    start_time = time.time()
    res2 = my_cdist(a, b)
    torch.cuda.synchronize()
    print(f'my cdist time {i}: {time.time() - start_time:.2f}s')

@ranery Could you advise more about #16 (comment) in which conv2d() is the root cause instead of cdist() ?

This is because when calculating the gradient and error, current pytorch-based solution considers unfolding to collect and subtract corresponding block of both the feature map and weight filter, leading to unnecessary memory consumption, I suppose the way they use in their paper is cuda version which directly convolute through the feature map, in which way the memory consumption is normal (similar to multiplication-based convolution networks).