tensorflow: Segfaults/NaN's in SVD

I’m getting failures trying to run SVD on a particular matrix. The result is either all NaN’s for u matrix, or it’s segfaults like below.

To reproduce, run this script in Python3: https://github.com/yaroslavvb/stuff/blob/master/svd_test.py

I can’t see anything special about this matrix beside the fact that it’s badly conditioned. IE, I can perform SVD on this matrix in Mathematica fine @rmlarsen

 #0  0x00007fffe320e121 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::perturbCol0(Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<long, 1, -1, 1, 1, -1>, 0, Eigen::InnerStride<1> > const&, Eigen::Matrix<float, -1, 1, 0, -1, 1> const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> >) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #1  0x00007fffe320fa81 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::computeSVDofM(long, long, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, 1, 0, -1, 1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #2  0x00007fffe321e21c in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::divide(long, long, long, long, long) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #3  0x00007fffe321dbb8 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::divide(long, long, long, long, long) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #4  0x00007fffe32220bd in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::compute(Eigen::Matrix<float, -1, -1, 1, -1, -1> const&, unsigned int) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #5  0x00007fffe32227a1 in tensorflow::SvdOp<float>::ComputeMatrix(tensorflow::OpKernelContext*, tensorflow::gtl::InlinedVector<Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, 4> const&, tensorflow::gtl::InlinedVector<Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, 4>*) ()                                                                 from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so                                       #6  0x00007fffe3228c75 in tensorflow::LinearAlgebraOp<float>::ComputeTensorSlice(tensorflow::OpKernelContext*, long long, tensorflow::gtl::InlinedVector<tensorflow::Tensor const*, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::TensorShape, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::Tensor*, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::TensorShape, 4> const&) ()
#    from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 16 (12 by maintainers)

Most upvoted comments

@strubell I’ve had good luck with SVD through numpy. Because SVD is O(n^3) operation, the O(n^2) overhead of copying things between scipy/TF becomes negligible for 1000x1000 mats or larger. Scipy/mkl version of SVD is considerably faster than TF version due to multi-threading, and the few crashes I’ve seen with scipy can be worked around by setting MKL_NUM_THREADS to lower number (ie, 15)

You can do something like SvdWrapper to easily switch between TensorFlow and scipy versions of SVD , example usage is in kfac_example

yaroslavvb on Sep 15, 2017

@strubell @yaroslavvb FYI if you use py_func on a CPU tensor there’s no overhead to copy it back and forth between tf and numpy in most cases (we only copy now if the tensor is not in column-order), and since numpy doesn’t hold the gil during SVD you shouldn’t see too much python overhead.

alextp on Oct 10, 2017

@rmlarsen maybe long term solution to speed/correctness is to make gesdd available in TensorFlow? Either new implementation, or MKL version – gesdd . There’s also gesvd which may be more robust, but also takes 3x longer

yaroslavvb on Sep 15, 2017