math: Wrong derivatives for cholesky and symmetric eigen decomposition

Description

Currently, derivatives of cholesky_decompose and eigenvalues_sym return lower triangular matrix. This is what finite differences return because the implementation of those functions considers only part of the input matrix. That’s why this bug is ignored by the test system. I think that the correct derivative matrix should be symmetric and not lower triangular. This also affects any other function involved in the computation and the calculated autodiff derivative is not correct anymore.

Example

Consider calculating log det A when A is an invertible symmetric matrix. This can be done using log_determinant or with cholesky_decompose, eigenvalues_sym. The derivative of logdetA is the inverse of A. The code below demonstrates that a.adj() is the lower triangular part of the inverse of A, while it is expected to be the full symmetric matrix.

#include <stan/math.hpp>
#include <iostream>

int main() {
    using stan::math::eigenvalues_sym;
    using stan::math::cholesky_decompose;
    using stan::math::determinant;
    using stan::math::log_determinant;
    using stan::math::diagonal;
    using stan::math::log;
    using stan::math::sum;

    stan::math::matrix_d a(4, 4);
    // Random symmetric matrix
    a << 1.8904,  0.7204, -0.1599,  1.2028,
         0.7204,  7.3394,  2.0895, -0.6151,
        -0.1599,  2.0895,  2.4601, -1.7219,
         1.2028, -0.6151, -1.7219,  4.5260;

    stan::math::matrix_v a1(a);
    stan::math::matrix_v a2(a);
    stan::math::matrix_v a3(a);
    stan::math::matrix_v a4(a);

    std::cout << "Here is the input matrix a:\n" << a << std::endl;

    // Way 1
    stan::math::matrix_v chol_mat = cholesky_decompose(a1);
    auto logdet1 = 2 * sum(log(diagonal(chol_mat)));

    // Way 2
    auto w = eigenvalues_sym(a2);
    auto logdet2 = sum(log(w));

    // Way 3
    auto det = determinant(a3);
    auto logdet3 = log(det);

    // Way 4
    auto logdet4 = log_determinant(a4);

    std::cout << "logdet1 = " << logdet1 << std::endl;
    std::cout << "logdet2 = " << logdet2 << std::endl;
    std::cout << "logdet3 = " << logdet3 << std::endl;
    std::cout << "logdet4 = " << logdet4 << std::endl;
    std::cout << "Are these all log(det(A))? " << ( (logdet1 - logdet2 < 1e-8) && (logdet2 - logdet3 < 1e-8) ? "True" : "False") << std::endl;

    stan::math::matrix_d a_inv = stan::math::inverse(a);
    std::cout << "Here is the matrix a_inv:\n" << a_inv << std::endl;

    stan::math::set_zero_all_adjoints();
    logdet1.grad();
    std::cout << "Here is the matrix a1_adj:\n" << a1.adj() << std::endl;
    bool way_1 = a_inv.val().isApprox(a1.adj());

    stan::math::set_zero_all_adjoints();
    logdet2.grad();
    std::cout << "Here is the matrix a2_adj:\n" << a2.adj() << std::endl;
    bool way_2 = a_inv.val().isApprox(a2.adj());

    stan::math::set_zero_all_adjoints();
    logdet3.grad();
    std::cout << "Here is the matrix a3_adj:\n" << a3.adj() << std::endl;
    bool way_3 = a_inv.val().isApprox(a3.adj());

    stan::math::set_zero_all_adjoints();
    logdet4.grad();
    std::cout << "Here is the matrix a4_adj:\n" << a4.adj() << std::endl;
    bool way_4 = a_inv.val().isApprox(a4.adj());

    std::cout << "Is way 1 correct? " << (way_1 ? "True" : "False") << std::endl;
    std::cout << "Is way 2 correct? " << (way_2 ? "True" : "False") << std::endl;
    std::cout << "Is way 3 correct? " << (way_3 ? "True" : "False") << std::endl;
    std::cout << "Is way 4 correct? " << (way_4 ? "True" : "False") << std::endl;
}

Current Version:

v3.1.0

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 3
  • Comments: 17 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks much for the areful report. This indeed looks like a problem that we should fix. I recall going over all of this when we were first building our autodiff and thinking that we didn’t want to double the derivatives by copying them into the upper triangle. So I think this may require some clever design to get to work in all contexts.