keops: Memory leak in Pykeops 2.2 not present in 2.1.2

I have a convolutional layer implementing a convolution over a point cloud using Pykeops. Using v2.1.2 this all works fine, in v2.2 this causes a memory leak that eventually crashes the training

from torch import nn
from pykeops.torch import LazyTensor

class ConvLayer(nn.Module):
    def __init__(
        self,
        in_channels: int,
        hidden_units: int,
        out_channels: int,
        radius: float,
    ):
        """
        Creates the KeOps convolution layer.

        Args:
            in_channels: dimension of input features
            hidden_units (int, optional): number of hidden uniots per point.
                Defaults to out_channels.
            out_channels: dimension of output features per point.
            radius : deviation of the Gaussian window on the
                quasi-geodesic distance `d_ij`. Defaults to 1..

        """

        super().__init__()

        self.radius = radius
        
        # 3D convolution filters, encoded as an MLP:
        self.conv = nn.Sequential(
            nn.Linear(3, hidden_units),
            nn.ReLU(), 
            nn.Linear(hidden_units, out_channels),
        )

def forward(
        self, points: Tensor, nuv: Tensor, features: Tensor, ranges=None, batch=None
    ):
        """
        points, local basis, in features  ->  out features
        (N, 3),   (N, 3, 3),    (N, I)    ->    (N, O)

        Args:
            points (Tensor): (N,3) point coordinates `x_i`.
            nuv (Tensor): (N,3,3) local coordinate systems `[n_i,u_i,v_i]`.
            features (Tensor): (N,I) input feature vectors `f_i`.

        Returns:
            (Tensor): (N,O) output feature vectors `f'_i`.
        """

        # Normalize the kernel radius:
        points = points / (sqrt(2.0) * self.radius)  # (N, 3)

        # Vertices:
        x_i = LazyTensor(points[:, None, :].contiguous())  # (N, 1, 3)
        x_j = LazyTensor(points[None, :, :].contiguous())  # (1, N, 3)

        normals = nuv[:, 0, :].contiguous().detach()

        # Local bases:
        nuv_i = LazyTensor(nuv.view(-1, 1, 9))  # (N, 1, 9)
        
        # Normals:
        n_i = nuv_i[:3]  # (N, 1, 3)
        n_j = LazyTensor(normals[None, :, :].contiguous())  # (1, N, 3)

        # Pseudo-geodesic squared distance:
        d2_ij = ((x_j - x_i) ** 2).sum(-1) * ((2 - (n_i | n_j)) ** 2)  # (N, N, 1)
        # Gaussian window:
        window_ij = (-d2_ij).exp()  # (N, N, 1)

        # Local coordinates:
        X_ij = nuv_i.matvecmult(x_j - x_i) 

        A_1, B_1 = self.conv[0].weight, self.conv[0].bias
        A_2, B_2 = self.conv[2].weight, self.conv[2].bias
        a_1 = LazyTensor(A_1.view(1, 1, -1))  # (1, 1, C*3)
        b_1 = LazyTensor(B_1.view(1, 1, -1))  # (1, 1, C)
        a_2 = LazyTensor(A_2.view(1, 1, -1))  # (1, 1, Hd*C)
        b_2 = LazyTensor(B_2.view(1, 1, -1))  # (1, 1, Hd)
        # MLP:
        X_1 = a_1.matvecmult(X_ij) + b_1  # (N, N, C)
        X_2 = X_1.relu()  # (N, N, C)
        X_3 = a_2.matvecmult(X_2) + b_2  # (N, N, Hd)
        X_4 = X_3.relu()

        f_j = LazyTensor(features[None, :, :].contiguous())
        F_ij = window_ij * X_4 * f_j
        conv_features = F_ij.sum(dim=1)
        return conv_features

Version 2.2:

Version 2.1.2:

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 18 (11 by maintainers)

Commits related to this issue

use ctx.save_for_forward, should fix issue #353 — committed to getkeops/keops by joanglaunes 5 months ago

Most upvoted comments

Version v2.2.1 should fix the issue. Thank you everyone for your effort.

bcharlier on Jan 25, 2024

After git-bisect I found this commit b1b304d8 to be the turning point where leak appears. And more precisely c2d5a372

bcharlier on Jan 24, 2024