GPflow: Cholesky decomposition was not successful. The input might not be valid. [Reoccurring problem]

The problem is that cholesky is failing for clearly positive definite kernels. This problem seems to come back whenever TF cholesky is involved. In numpy based GP code the problem doesn’t occur, and if it does, it can be solved by conditioning the matrix or using psuedo inverses/determinants.

This is an unsolved issue similar to closed issue #78 . In that case to user was able to constrain his domain to a sub-domain where it was solved. Here in the MVCE I show that this doesn’t work.

MVCE

https://gist.github.com/Joshuaalbert/e34f71c65b28263161ae09f5b9a6f3be

The data provides an easy to solve problem, yet TF cholesky fails. I know that this timeseries has a lengthscale 47 which is where it is initialized. I also added an untrainable white kernel of very high variance, and it still fails. It also fails with and without a mean function. It also fails with and without the logstic tranform.

Back to the problem

There is a serious problem here guys and gals. Because of this same issue in my own code, for the exact same problem actually, I’ve asked that TF provide dynamic exception catching (https://github.com/tensorflow/tensorflow/issues/10332).

UPDATE: The new Eager Execution mode of tensorflow is really cool, and allow pythonic control flow. It is worth looking into I believe.

In a pythonic solution one would do:

try:
    #method using Cholesky
except:
    #method using svd and pseudo inv/det

This is how I solved this problem in my numpy code dealing with GPs. The problem is that TF requires doing something like below which can be tedious to code in an elegant way.

def _no_cho_version(...):
    #using svd
def _cho_version(...):
   #using cho as preferable
tf.cond(did_cholesky_fail, _no_cho_version, _cho_version)

try:
    feed_dict[did_cholesky_fail] = False
    sess.run(..., feed_dict=feed_dict)
except:
    feed_dict[did_cholesky_fail] = True
    sess.run(..., feed_dict=feed_dict)

Matrix conditioning might be a solution

UPDATE: apparently, this fails when there are degenerate eigen values (because gradients can only be uniquely defined for diferrent eigen values. Thus adding a random jitter first works.

For example the below code would take a non pos-def matrix and make it pos-def.

Kf = (Kf + tf.transpose(Kf,perm=[0,2,1]))/2.
e,v = tf.self_adjoint_eig(Kf)
e = tf.where(e > 1e-14, e, 1e-14*tf.ones_like(e))
Kf_pos_def = tf.matmul(tf.matmul(v,tf.matrix_diag(e),transpose_a=True),v)

Note that cholesky should work for pos-semi-def where there are some zero eigen values. You can test this yourself. So in theory I should be able to replace line e = tf.where(e > 1e-14, e, 1e-14*tf.ones_like(e)) with e = tf.where(e > 0, e, tf.zeros_like(e))

Conditioning the matrix also fails sometimes, because TF has a buggy eigen code. It only fails when you ask for gradient though; the forward calculation work then.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 19 (7 by maintainers)

Most upvoted comments

The problem is completely solved by centering the data. I added these lines to the script:

X = (X - X.mean()) / X.std()
Y = (Y - Y.mean()) / Y.std()

, removed the constraints on the lengthscales, and got this:

figure_1-1

jameshensman on Nov 27, 2017

Adding “fast=False” was worked for me

z = tf.matrix_solve_ls(x,screen_points, fast=False)

MariyaSokur on May 29, 2018