scikit-learn: All kernel regression methods should accept precomputed Gram matrices

SVR has an option

    kernel='precomputed'

If this is chosen, then the X array passed to the fit method is the Gram matrix of the training examples. This option should also be available for GPR and KRR.

Here is some motivation. Many machine learning projects begin by defining a feature set, and many algorithms intrinsically require a vector of real numbers for each sample. e.g. linear regression.

Kernel methods take a different approach: the modeller supplies a function, the “kernel”, which measures the similarity between two samples. This need not be expressed in terms of features. For example, it is easy to say what it means for two DNA sequences to be similar, but hard to reduce a DNA sequence to a vector of features. So SVR correctly is willing to build a model with:

model = svm.SVR(kernel='precomputed') 
model.fit( kernel(training.molecule),  training.y.values)

(My current project is a cheminformatic one.)

It would be great if the other kernel methods supported this. At present, they require a kernel function which they pass the X values to, after checking that X is an array of floats. So some of the value of kernel methods is denied.

About this issue

Original URL
State: open
Created 7 years ago
Comments: 25 (12 by maintainers)

Commits related to this issue

document "precomputed" kernel for KernelRidge See #8445. — committed to wesbarnett/scikit-learn by wesbarnett 6 years ago
document "precomputed" kernel for KernelRidge See #8445. — committed to wesbarnett/scikit-learn by wesbarnett 6 years ago
DOC document "precomputed" kernel for KernelRidge (#11134) * document "precomputed" kernel for KernelRidge See #8445. * fix doc for KernelRidge use n_samples_fitted in score — committed to scikit-learn/scikit-learn by wesbarnett 6 years ago

Most upvoted comments

See my comment of 24 May 2018. The attachment provides a subclass of GPR that accepts kernel='precomputed'.

chrishmorris on Mar 2, 2020

Just a head’s up. The script by @chrishmorris has a bug in line 67. If self.kernel == 'precomputed', then self.kernel_ = None and then self.kernel_.n_dims cannot be evaluated. I assume this if statement should be skipped if self.kernel_ = None, which is an easy enough fix. In addition, line 8 should be uncommented to allow for from operator import itemgetter. I’ve attached a script that’s up-to-date with the current version of scikit-learn: GPR.py.txt

Anyway, about this issue in general, I agree with @chrishmorris and do not understand why a solution like kernel='precomputed' (just like is currently implemented in KRR and SVR) would not be desirable. It seems like having a precomputed kernel class would be more complicated, not less.

Andrew-S-Rosen on Mar 31, 2020

Examples are published by adding a file under the examples directory in this repository. See our contributing guide.

jnothman on Jan 25, 2020

To do GPR with a precomputed matrix, I use the attached subclass.

Please feel free to use the code if you like it. It is directly determined by the nature of the problem to be solved, the conventions of the Python community, and the sklearn API, so I consider that I have no copyright in it.

GPR.py.txt

chrishmorris on May 24, 2018