scipy: RBFInterpolator gives Segmentation fault for large images

scipy.interpolate.RBFInterpolator gives Segmentation fault for large images/2D arrays. Moreover, for some dimensions (example - 250 x 250) the algorithm is a bit slow. I am not sure whether the issue is with Pythran or with the implementation RBFInterpolator or is it just something specific to my system.

Reproducing code example:

Reference - https://stackoverflow.com/questions/21690608/numpy-inpaint-nans-interpolate-and-extrapolate/21697401#21697401

import numpy as np
import pythran
import scipy
from scipy.interpolate import RBFInterpolator

print("Pythran Version: ", pythran.__version__)
print("Scipy Version: ", scipy.__version__)
print("NumPy Version: ", np.__version__)

num_points = [10, 100, 500]
for num in num_points:
    x = np.linspace(0, 1, num)
    y = x[:, None]
    image = x + y
    print(image.shape)

    # Destroy some values
    mask = np.random.random(image.shape) > 0.7
    image[mask] = np.nan

    valid_mask = ~np.isnan(image)
    coords = np.array(np.nonzero(valid_mask)).T
    values = image[valid_mask]

    it = RBFInterpolator(coords, values)

    filled = it(list(np.ndindex(image.shape))).reshape(image.shape)

    del x, y, image, it, filled

    print("Completed Successfully for %d x %d image."%(num, num))

Error message:

Pythran Version:  0.9.12.dev0
Scipy Version:  1.8.0.dev0+1285.ac3b219
NumPy Version:  1.20.3
(10, 10)
Completed Successfully for 10 x 10 image.
(100, 100)
Completed Successfully for 100 x 100 image.
(500, 500)
Segmentation fault

Scipy/Numpy/Python version information:

1.8.0.dev0+1285.ac3b219 1.20.3 sys.version_info(major=3, minor=9, micro=4, releaselevel='final', serial=0)

Comments

Related PR - https://github.com/scipy/scipy/pull/13595

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

I noticed that, openmp directives aren’t there in _rbfinterp_pythran.py, so OpenMP isn’t used I believe. Should we add them?

No we shouldn’t add any directives here. OpenMP isn’t portable, and hence not allowed in SciPy binaries (see https://github.com/scipy/scipy/issues/10239#issuecomment-795030817). Let me comment on your other question in gh-13308 in more detail.

How does the value of neighbor argument affects the quality of results? Is there any way with which we can select best of these values (according to the maximum available memory on a system) for a given input automatically?

If you are trying to interpolate a smooth function, then the error in the interpolant will generally decrease with a larger value for neighbors. In your example, you are interpolating a plane, which can be reproduced exactly regardless of the value for neighbors (although neighbors does need to be large enough so that the neighborhood of data points is not collinear).

It is hard to come up with a good way to automatically pick a value for neighbors. It depends on how many data points you have, and there is also a trade-off between speed/memory and accuracy. If you have fewer than about 5000 data points, then the fastest and most accurate RBF interpolation would probably be with neighbors=None. Setting neighbors to None uses all the data points but it avoids to cost of finding the nearest neighbors. When interpolating images (> 5000 data points), there is not too much to gain from having a large value for neighbors unless the image values are smooth over a large neighborhood. So I like to use neighbors=20 for image inpainting, and I also set kernel="linear", which is faster and also prevents the interpolant from overshooting the data.

In the case of num=500 you are interpolating about 175000 data points, which means that RBFInterpolator will need to make a 175000 x 175000 matrix. So it makes sense that you are running out of memory. If you add the argument neighbors=50 when you create RBFInterpolator, then it will break the interpolation problem into one small interpolation problem for each evaluation point, using only the 50 closest data points. That takes about 25 seconds on my machine. If you lower neighbors to 20, then it takes about 11 seconds. Evaluation can definitely be sped up with parallelization.

Here is a link which claims to have an O(N.logN) implementation. A paper which has O(N.lgN) implementation of RBF interpolator.

Thanks for pointing these out. I have used compact RBFs and iterative solvers in the past with some success. Adding that functionality to RBFInterpolator may be useful but it would be a large undertaking.

Thanks for the report @czgdp1807! I can reproduce this.

This looks like a bug in Pythran, guessing it should raise a MemoryError rather than segfault. Cc @serge-sans-paille

(500, 500) points doesn’t seem like an unusually large amount of points to evaluate at once either, that’s a regular image size. Any thoughts on this @treverhines?