scipy: RBFInterpolator gives Segmentation fault for large images
scipy.interpolate.RBFInterpolator gives Segmentation fault for large images/2D arrays. Moreover, for some dimensions (example - 250 x 250) the algorithm is a bit slow. I am not sure whether the issue is with Pythran or with the implementation RBFInterpolator or is it just something specific to my system.
Reproducing code example:
import numpy as np
import pythran
import scipy
from scipy.interpolate import RBFInterpolator
print("Pythran Version: ", pythran.__version__)
print("Scipy Version: ", scipy.__version__)
print("NumPy Version: ", np.__version__)
num_points = [10, 100, 500]
for num in num_points:
x = np.linspace(0, 1, num)
y = x[:, None]
image = x + y
print(image.shape)
# Destroy some values
mask = np.random.random(image.shape) > 0.7
image[mask] = np.nan
valid_mask = ~np.isnan(image)
coords = np.array(np.nonzero(valid_mask)).T
values = image[valid_mask]
it = RBFInterpolator(coords, values)
filled = it(list(np.ndindex(image.shape))).reshape(image.shape)
del x, y, image, it, filled
print("Completed Successfully for %d x %d image."%(num, num))
Error message:
Pythran Version: 0.9.12.dev0
Scipy Version: 1.8.0.dev0+1285.ac3b219
NumPy Version: 1.20.3
(10, 10)
Completed Successfully for 10 x 10 image.
(100, 100)
Completed Successfully for 100 x 100 image.
(500, 500)
Segmentation fault
Scipy/Numpy/Python version information:
1.8.0.dev0+1285.ac3b219 1.20.3 sys.version_info(major=3, minor=9, micro=4, releaselevel='final', serial=0)
Comments
Related PR - https://github.com/scipy/scipy/pull/13595
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (16 by maintainers)
No we shouldn’t add any directives here. OpenMP isn’t portable, and hence not allowed in SciPy binaries (see https://github.com/scipy/scipy/issues/10239#issuecomment-795030817). Let me comment on your other question in gh-13308 in more detail.
If you are trying to interpolate a smooth function, then the error in the interpolant will generally decrease with a larger value for
neighbors. In your example, you are interpolating a plane, which can be reproduced exactly regardless of the value forneighbors(althoughneighborsdoes need to be large enough so that the neighborhood of data points is not collinear).It is hard to come up with a good way to automatically pick a value for
neighbors. It depends on how many data points you have, and there is also a trade-off between speed/memory and accuracy. If you have fewer than about 5000 data points, then the fastest and most accurate RBF interpolation would probably be withneighbors=None. SettingneighborstoNoneuses all the data points but it avoids to cost of finding the nearest neighbors. When interpolating images (> 5000 data points), there is not too much to gain from having a large value forneighborsunless the image values are smooth over a large neighborhood. So I like to useneighbors=20for image inpainting, and I also setkernel="linear", which is faster and also prevents the interpolant from overshooting the data.In the case of
num=500you are interpolating about 175000 data points, which means thatRBFInterpolatorwill need to make a 175000 x 175000 matrix. So it makes sense that you are running out of memory. If you add the argumentneighbors=50when you createRBFInterpolator, then it will break the interpolation problem into one small interpolation problem for each evaluation point, using only the 50 closest data points. That takes about 25 seconds on my machine. If you lowerneighborsto 20, then it takes about 11 seconds. Evaluation can definitely be sped up with parallelization.Thanks for pointing these out. I have used compact RBFs and iterative solvers in the past with some success. Adding that functionality to
RBFInterpolatormay be useful but it would be a large undertaking.Thanks for the report @czgdp1807! I can reproduce this.
This looks like a bug in Pythran, guessing it should raise a
MemoryErrorrather than segfault. Cc @serge-sans-paille(500, 500)points doesn’t seem like an unusually large amount of points to evaluate at once either, that’s a regular image size. Any thoughts on this @treverhines?