scikit-learn: All kernel regression methods should accept precomputed Gram matrices
SVR has an option
kernel='precomputed'
If this is chosen, then the X array passed to the fit method is the Gram matrix of the training examples. This option should also be available for GPR and KRR.
Here is some motivation. Many machine learning projects begin by defining a feature set, and many algorithms intrinsically require a vector of real numbers for each sample. e.g. linear regression.
Kernel methods take a different approach: the modeller supplies a function, the “kernel”, which measures the similarity between two samples. This need not be expressed in terms of features. For example, it is easy to say what it means for two DNA sequences to be similar, but hard to reduce a DNA sequence to a vector of features. So SVR correctly is willing to build a model with:
model = svm.SVR(kernel='precomputed')
model.fit( kernel(training.molecule), training.y.values)
(My current project is a cheminformatic one.)
It would be great if the other kernel methods supported this. At present, they require a kernel function which they pass the X values to, after checking that X is an array of floats. So some of the value of kernel methods is denied.
About this issue
- Original URL
- State: open
- Created 7 years ago
- Comments: 25 (12 by maintainers)
Commits related to this issue
- document "precomputed" kernel for KernelRidge See #8445. — committed to wesbarnett/scikit-learn by wesbarnett 6 years ago
- document "precomputed" kernel for KernelRidge See #8445. — committed to wesbarnett/scikit-learn by wesbarnett 6 years ago
- DOC document "precomputed" kernel for KernelRidge (#11134) * document "precomputed" kernel for KernelRidge See #8445. * fix doc for KernelRidge use n_samples_fitted in score — committed to scikit-learn/scikit-learn by wesbarnett 6 years ago
See my comment of 24 May 2018. The attachment provides a subclass of GPR that accepts
kernel='precomputed'
.Just a head’s up. The script by @chrishmorris has a bug in line 67. If
self.kernel == 'precomputed'
, thenself.kernel_ = None
and thenself.kernel_.n_dims
cannot be evaluated. I assume thisif
statement should be skipped ifself.kernel_ = None
, which is an easy enough fix. In addition, line 8 should be uncommented to allow forfrom operator import itemgetter
. I’ve attached a script that’s up-to-date with the current version of scikit-learn: GPR.py.txtAnyway, about this issue in general, I agree with @chrishmorris and do not understand why a solution like
kernel='precomputed'
(just like is currently implemented in KRR and SVR) would not be desirable. It seems like having a precomputed kernel class would be more complicated, not less.Examples are published by adding a file under the examples directory in this repository. See our contributing guide.
To do GPR with a precomputed matrix, I use the attached subclass.
Please feel free to use the code if you like it. It is directly determined by the nature of the problem to be solved, the conventions of the Python community, and the sklearn API, so I consider that I have no copyright in it.
GPR.py.txt