scipy: RBFInterpolator gives Segmentation fault for large images
scipy.interpolate.RBFInterpolator
gives Segmentation fault
for large images/2D arrays. Moreover, for some dimensions (example - 250 x 250) the algorithm is a bit slow. I am not sure whether the issue is with Pythran or with the implementation RBFInterpolator
or is it just something specific to my system.
Reproducing code example:
import numpy as np
import pythran
import scipy
from scipy.interpolate import RBFInterpolator
print("Pythran Version: ", pythran.__version__)
print("Scipy Version: ", scipy.__version__)
print("NumPy Version: ", np.__version__)
num_points = [10, 100, 500]
for num in num_points:
x = np.linspace(0, 1, num)
y = x[:, None]
image = x + y
print(image.shape)
# Destroy some values
mask = np.random.random(image.shape) > 0.7
image[mask] = np.nan
valid_mask = ~np.isnan(image)
coords = np.array(np.nonzero(valid_mask)).T
values = image[valid_mask]
it = RBFInterpolator(coords, values)
filled = it(list(np.ndindex(image.shape))).reshape(image.shape)
del x, y, image, it, filled
print("Completed Successfully for %d x %d image."%(num, num))
Error message:
Pythran Version: 0.9.12.dev0
Scipy Version: 1.8.0.dev0+1285.ac3b219
NumPy Version: 1.20.3
(10, 10)
Completed Successfully for 10 x 10 image.
(100, 100)
Completed Successfully for 100 x 100 image.
(500, 500)
Segmentation fault
Scipy/Numpy/Python version information:
1.8.0.dev0+1285.ac3b219 1.20.3 sys.version_info(major=3, minor=9, micro=4, releaselevel='final', serial=0)
Comments
Related PR - https://github.com/scipy/scipy/pull/13595
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (16 by maintainers)
No we shouldn’t add any directives here. OpenMP isn’t portable, and hence not allowed in SciPy binaries (see https://github.com/scipy/scipy/issues/10239#issuecomment-795030817). Let me comment on your other question in gh-13308 in more detail.
If you are trying to interpolate a smooth function, then the error in the interpolant will generally decrease with a larger value for
neighbors
. In your example, you are interpolating a plane, which can be reproduced exactly regardless of the value forneighbors
(althoughneighbors
does need to be large enough so that the neighborhood of data points is not collinear).It is hard to come up with a good way to automatically pick a value for
neighbors
. It depends on how many data points you have, and there is also a trade-off between speed/memory and accuracy. If you have fewer than about 5000 data points, then the fastest and most accurate RBF interpolation would probably be withneighbors=None
. Settingneighbors
toNone
uses all the data points but it avoids to cost of finding the nearest neighbors. When interpolating images (> 5000 data points), there is not too much to gain from having a large value forneighbors
unless the image values are smooth over a large neighborhood. So I like to useneighbors=20
for image inpainting, and I also setkernel="linear"
, which is faster and also prevents the interpolant from overshooting the data.In the case of
num=500
you are interpolating about 175000 data points, which means thatRBFInterpolator
will need to make a 175000 x 175000 matrix. So it makes sense that you are running out of memory. If you add the argumentneighbors=50
when you createRBFInterpolator
, then it will break the interpolation problem into one small interpolation problem for each evaluation point, using only the 50 closest data points. That takes about 25 seconds on my machine. If you lowerneighbors
to 20, then it takes about 11 seconds. Evaluation can definitely be sped up with parallelization.Thanks for pointing these out. I have used compact RBFs and iterative solvers in the past with some success. Adding that functionality to
RBFInterpolator
may be useful but it would be a large undertaking.Thanks for the report @czgdp1807! I can reproduce this.
This looks like a bug in Pythran, guessing it should raise a
MemoryError
rather than segfault. Cc @serge-sans-paille(500, 500)
points doesn’t seem like an unusually large amount of points to evaluate at once either, that’s a regular image size. Any thoughts on this @treverhines?