tensorflow: Segfaults/NaN's in SVD
I’m getting failures trying to run SVD on a particular matrix. The result is either all NaN’s for u matrix, or it’s segfaults like below.
To reproduce, run this script in Python3: https://github.com/yaroslavvb/stuff/blob/master/svd_test.py
I can’t see anything special about this matrix beside the fact that it’s badly conditioned. IE, I can perform SVD on this matrix in Mathematica fine @rmlarsen
#0 0x00007fffe320e121 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::perturbCol0(Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<long, 1, -1, 1, 1, -1>, 0, Eigen::InnerStride<1> > const&, Eigen::Matrix<float, -1, 1, 0, -1, 1> const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> > const&, Eigen::Ref<Eigen::Array<float, -1, 1, 0, -1, 1>, 0, Eigen::InnerStride<1> >) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #1 0x00007fffe320fa81 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::computeSVDofM(long, long, Eigen::Matrix<float, -1, -1, 0, -1, -1>&, Eigen::Matrix<float, -1, 1, 0, -1, 1>&, Eigen::Matrix<float, -1, -1, 0, -1, -1>&) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #2 0x00007fffe321e21c in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::divide(long, long, long, long, long) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #3 0x00007fffe321dbb8 in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::divide(long, long, long, long, long) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #4 0x00007fffe32220bd in Eigen::BDCSVD<Eigen::Matrix<float, -1, -1, 1, -1, -1> >::compute(Eigen::Matrix<float, -1, -1, 1, -1, -1> const&, unsigned int) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
# #5 0x00007fffe32227a1 in tensorflow::SvdOp<float>::ComputeMatrix(tensorflow::OpKernelContext*, tensorflow::gtl::InlinedVector<Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1> const, 0, Eigen::Stride<0, 0> >, 4> const&, tensorflow::gtl::InlinedVector<Eigen::Map<Eigen::Matrix<float, -1, -1, 1, -1, -1>, 0, Eigen::Stride<0, 0> >, 4>*) () from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so #6 0x00007fffe3228c75 in tensorflow::LinearAlgebraOp<float>::ComputeTensorSlice(tensorflow::OpKernelContext*, long long, tensorflow::gtl::InlinedVector<tensorflow::Tensor const*, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::TensorShape, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::Tensor*, 4> const&, tensorflow::gtl::InlinedVector<tensorflow::TensorShape, 4> const&) ()
# from /home/yaroslav/.conda/envs/whitening/lib/python3.5/site-packages/tensorflow/python/_pywrap_tensorflow_internal.so
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 16 (12 by maintainers)
@strubell I’ve had good luck with SVD through numpy. Because SVD is O(n^3) operation, the O(n^2) overhead of copying things between scipy/TF becomes negligible for 1000x1000 mats or larger. Scipy/mkl version of SVD is considerably faster than TF version due to multi-threading, and the few crashes I’ve seen with scipy can be worked around by setting MKL_NUM_THREADS to lower number (ie, 15)
You can do something like SvdWrapper to easily switch between TensorFlow and scipy versions of SVD , example usage is in kfac_example
@strubell @yaroslavvb FYI if you use py_func on a CPU tensor there’s no overhead to copy it back and forth between tf and numpy in most cases (we only copy now if the tensor is not in column-order), and since numpy doesn’t hold the gil during SVD you shouldn’t see too much python overhead.
@rmlarsen maybe long term solution to speed/correctness is to make gesdd available in TensorFlow? Either new implementation, or MKL version – gesdd . There’s also gesvd which may be more robust, but also takes 3x longer