scikit-learn: ndarray is not C-contiguous error, when using KNeighborsRegressor

Describe the bug

I came across this error when building K-nearest neighbor model for the project that I am working on. I checked the flags of the numpy array that I was passing to the predict method of KNeighborsRegressor, and it showed that everything was okay with the ndarray, but the predict still throws this error.

  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  WRITEBACKIFCOPY : False
  UPDATEIFCOPY : False

Steps/Code to Reproduce

I will provide you with the knn model and data for which this error occurs.

from sklearn.neighbors import KNeighborsRegressor
knn_model: KNeighborsRegressor = load(file_path)
knn_model.predict(np.array([175.6, 12.2, 97.8, 4.9, 92.6, -0.4999999999999998, 0.8660254037844387, -1.0, 1.2246467991473532e-16, 65.0, 83.0, 29.0,0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]).reshape(1, -1))

Expected Results

We should get a prediction from our knn model. Unfortunately, we get this error:

Actual Results

File "C:\my_project_dir\src\BOLERO_MI\agents\knn.py", line 90, in predict
    t_steps.append(self.model.predict(np.array(exo.iloc[i, :].values.reshape(1, -1))))
  File "C:\my_project_dir\venv\lib\site-packages\sklearn\neighbors\_regression.py", line 222, in predict
    neigh_ind = self.kneighbors(X, return_distance=False)
  File "C:\my_project_dir\venv\lib\site-packages\sklearn\neighbors\_base.py", line 763, in kneighbors
    results = PairwiseDistancesArgKmin.compute(
  File "sklearn\metrics\_pairwise_distances_reduction.pyx", line 672, in sklearn.metrics._pairwise_distances_reduction.PairwiseDistancesArgKmin.compute
  File "sklearn\metrics\_pairwise_distances_reduction.pyx", line 1055, in sklearn.metrics._pairwise_distances_reduction.FastEuclideanPairwiseDistancesArgKmin.__init__
  File "sklearn\metrics\_dist_metrics.pyx", line 1300, in sklearn.metrics._dist_metrics.DatasetsPair.get_for
  File "sklearn\metrics\_dist_metrics.pyx", line 1349, in sklearn.metrics._dist_metrics.DenseDenseDatasetsPair.__init__
  File "stringsource", line 658, in View.MemoryView.memoryview_cwrapper
  File "stringsource", line 349, in View.MemoryView.memoryview.__cinit__
ValueError: ndarray is not C-contiguous

MY SUGGESTION FOR SOLVING THIS PROBLEM: at line 763 we should change this code:

results = PairwiseDistancesArgKmin.compute(
                X=X,
                Y=self._fit_X,
                k=n_neighbors,
                metric=self.effective_metric_,
                metric_kwargs=self.effective_metric_params_,
                strategy="auto",
                return_distance=return_distance,
            )

and we should use this instead

results = PairwiseDistancesArgKmin.compute(
                X=X,
                Y=np.ascontiguousarray(self._fit_X),
                k=n_neighbors,
                metric=self.effective_metric_,
                metric_kwargs=self.effective_metric_params_,
                strategy="auto",
                return_distance=return_distance,
            )

Please let me know what you think of this problem. I can provide you with a pickle file of the model so that you can reproduce this issue.

Versions

System:
    python: 3.9.10 (tags/v3.9.10:f2f3f53, Jan 17 2022, 15:14:21) [MSC v.1929 64 bit (AMD64)]
executable: C:\my_project_dir\venv\Scripts\python.exe
   machine: Windows-10-10.0.22000-SP0
Python dependencies:
      sklearn: 1.1.1
          pip: 21.3.1
   setuptools: 60.2.0
        numpy: 1.19.5
        scipy: 1.8.1
       Cython: 0.29.30
       pandas: 1.4.2
   matplotlib: 3.4.2
       joblib: 1.0.1
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\my_project_dir\venv\Lib\site-packages\numpy\.libs\libopenblas.WCDJNK7YVMPZQ2ME2ZZHJJRJ3JIKNDB7.gfortran-win_amd64.dll
        version: 0.3.13
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: C:\my_project_dir\venv\Lib\site-packages\scipy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
        version: 0.3.17
threading_layer: pthreads
   architecture: Haswell
    num_threads: 8
       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: C:\my_project_dir\venv\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
    num_threads: 8
       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\my_project_dir\venv\Lib\site-packages\torch\lib\libiomp5md.dll
        version: None
    num_threads: 4
       user_api: openmp
   internal_api: openmp
         prefix: libiomp
       filepath: C:\my_project_dir\venv\Lib\site-packages\torch\lib\libiompstubs5md.dll
        version: None
    num_threads: 1
Backend TkAgg is interactive backend. Turning interactive mode on.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 15 (9 by maintainers)

Most upvoted comments

Thanks @VukMNE. I still can’t reproduce, even with these parameters. Was the model saved on an old version of scikit-learn and loaded on a more recent version ?

@jjerphan the issue does not come from the fact that it’s a vector. You get the same error with a C ordered (only) 2D array. I think the issue comes from how the model was fitted but we don’t have access to that.

@VukMNE can you tell us how you build, fit and save the knn model ?

Thanks for the report. This is a known issue and a fix has been submitted here https://github.com/scikit-learn/scikit-learn/pull/23990. It should be merged soon

model.fit(train_x.to_numpy(), train_y.to_numpy())

This suggests that the data comes from a dataframe. Then train_x.to_numpy() is F-contiguous. I guess the model saves X_fit as F-contiguous and then fails at predict when trying to compute pairwise distances aggregation. However I can’t reproduce the behavior. Maybe it happens with a specific combination of parameters. @VukMNE do you have the parameters of the knn model ?