scipy: scipy.optimize has broken my trust.

This submodule has broken my trust.

I am genuinely in awe of how it was included in scipy 1.0. I am tagging two heavy hitters to attempt to bring discussion to this issue since, in general, scipy.optimize issues seem to go without discussion or fixes. @pv @teoliphant

As I understand, an effort was made in the last few version of scipy leading up to and including 1.0 to unify the interface of the different optimization routines and eventually deprecate fmin_xxxx.

A fairly large issue with this is that many of the inputs and outputs in the scipy documentation for these functions are simply wrong.

Take, for example the BFGS method.

the documentation for disp is incorrect; convergence messages are not printed, only a blurb at the end. return_all appears in the call signature at the top and is undocumented. I looked to fmin_bfgs hoping its more specific documentation would be superior. The documentation for that is even more misleading. I hoped that the specified type for the constituents of allvecs being OptimizationResult would mean they hold cost function values and input vector values. They are simply input vector values. The documentation here is also misleading.

I also understand that many of the algorithms are wrapped code in another language, e.g. L-BFGS-B is written in FORTRAN77, and this may make their modification difficult. Earlier today I opened an issue that L-BFGS-B does not respect the maxiter input, #8372. There are other issues, e.g. lack of respect for user’s input dtype because of hard coded doubles in the fortran code, for example.

However, the docs being misleading is a great travesty for the users of this module, which I believe is among the more important/widely used of scipy. I would work to improve them myself, but I do not feel that my knowledge of the inner workings of the module is great enough to do so without myself making mistakes.

Returning to L-BFGS-B, the wrapped fortran code appears to contain a variety of errors or behaviors that have a varied level of harmfulness. For example, this is an example output:

RUNNING THE L-BFGS-B CODE

           * * *

Machine precision = 2.220D-16
 N =            4     M =           10

At X0         0 variables are exactly at the bounds

At iterate    0    f=  2.75370D+01    |proj g|=  1.06893D+02

At iterate    1    f=  2.27149D+01    |proj g|=  1.76623D+02

At iterate    2    f=  1.20196D+01    |proj g|=  1.20733D+02

At iterate    3    f=  8.05081D+00    |proj g|=  8.74848D+01

At iterate    4    f=  5.00344D+00    |proj g|=  8.42253D+01

At iterate    5    f=  3.70940D+00    |proj g|=  1.01991D+02

At iterate    6    f=  1.57818D+00    |proj g|=  7.10948D+01

At iterate    7    f=  4.81956D-01    |proj g|=  3.76945D+01

At iterate    8    f=  6.81301D-02    |proj g|=  2.20645D+01

At iterate    9    f=  1.34639D-02    |proj g|=  8.79078D+00

           * * *

Tit   = total number of iterations
Tnf   = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip  = number of BFGS updates skipped
Nact  = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F     = final function value

           * * *

   N    Tit     Tnf  Tnint  Skip  Nact     Projg        F
    4      9     12      1     0     0   8.791D+00   1.346D-02
  F =  1.346394518877836E-002

CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH             

 Cauchy                time 0.000E+00 seconds.
 Subspace minimization time 0.000E+00 seconds.
 Line search           time 0.000E+00 seconds.

 Total User time 0.000E+00 seconds.

the code writes “iterate” instead of “iteration” (a typo). This is an annoyance as it is not very professional, but it does not break anything. The separate timings for Cauchy, subspace minimization, and line search are also always zero. Same for user time. It would be nice for these to not be a filler values since this is misleading, but same thing - this is creature comfort problem, not an accuracy problem.

My application is in a topic known as wavefront sensing, or phase retrieval. We put in some number of parameters, and the landscape of the cost function is highly evolutionary as the optimizer traverses the space. As a result, when designing a cost function or debugging code it is desirable to know the state of the optimizer, even without knowing its internal mechanisms, as it walks the space. To do this I do the following:

Save this output text from L-BFGS-B, and scrape the f= values from it.
Use a callback that stores parameter vectors.

I expect these two things to have the same length always, and depend on cost[n] being associated with all_x[n] without registration errors between these two blocks of data.

In my particular application (I can share code on request, but 1000-2000 lines are called to implement my routine split across a few modules; there is no easy “def foo()” analog that I have found to reproduce this) I am having trouble understanding why my cost function appears to not be correlated, based on the output of l-bfgs-b, to the closeness of the current iteration input to truth:

cost function is not correlated with true error metric

This has cost me dozens of hours of debugging time, only to learn that something is wrong with L-BFGS-B that causes the output of options={‘disp’: True} to not be the the cost function at a given parameter vector.

Because I capture stdout, I am not able to use print. I have embedded a line in my cost function that appends the current parameter vector and cost to a .txt file and resorted to matching values in that to L-BFGS-B print output to see what is wrong. The text from that is below, with numbered lines

00. params = [ 0.  0.  0.  0.] 	 f(x) = 27.5369873872

01. params = [  1.00000000e-08   0.00000000e+00   0.00000000e+00   0.00000000e+00] 	 f(x) = 27.5369884562

02. params = [  0.00000000e+00   1.00000000e-08   0.00000000e+00   0.00000000e+00] 	 f(x) = 27.5369867324

03. params = [  0.00000000e+00   0.00000000e+00   1.00000000e-08   0.00000000e+00] 	 f(x) = 27.5369870096

04. params = [  0.00000000e+00   0.00000000e+00   0.00000000e+00   1.00000000e-08] 	 f(x) = 27.5369875808

05. params = [-0.80769268  0.49476373  0.28537275 -0.14630029] 	 f(x) = 86.4418490547

06. params = [-0.80769267  0.49476373  0.28537275 -0.14630029] 	 f(x) = 86.4418490135

07. params = [-0.80769268  0.49476374  0.28537275 -0.14630029] 	 f(x) = 86.441849366

08. params = [-0.80769268  0.49476373  0.28537276 -0.14630029] 	 f(x) = 86.4418499781

09. params = [-0.80769268  0.49476373  0.28537275 -0.14630028] 	 f(x) = 86.4418482602

10. params = [-0.16406864  0.1005026   0.05796848 -0.02971834] 	 f(x) = 22.714940612

11. params = [-0.16406863  0.1005026   0.05796848 -0.02971834] 	 f(x) = 22.7149388457

12. params = [-0.16406864  0.10050261  0.05796848 -0.02971834] 	 f(x) = 22.7149405671

13. params = [-0.16406864  0.1005026   0.05796849 -0.02971834] 	 f(x) = 22.7149404753

14. params = [-0.16406864  0.1005026   0.05796848 -0.02971833] 	 f(x) = 22.7149421094

15. params = [-0.10087717  0.09158391  0.05575493 -0.06632038] 	 f(x) = 12.0195875306

16. params = [-0.10087716  0.09158391  0.05575493 -0.06632038] 	 f(x) = 12.0195863232

17. params = [-0.10087717  0.09158392  0.05575493 -0.06632038] 	 f(x) = 12.0195878653

18. params = [-0.10087717  0.09158391  0.05575494 -0.06632038] 	 f(x) = 12.0195883001

19. params = [-0.10087717  0.09158391  0.05575493 -0.06632037] 	 f(x) = 12.0195867159

20. params = [ 0.15842029 -0.01548538 -0.01632799 -0.07159999] 	 f(x) = 48.7174210299

21. params = [ 0.1584203  -0.01548538 -0.01632799 -0.07159999] 	 f(x) = 48.7174233933

22. params = [ 0.15842029 -0.01548537 -0.01632799 -0.07159999] 	 f(x) = 48.717421743

23. params = [ 0.15842029 -0.01548538 -0.01632798 -0.07159999] 	 f(x) = 48.7174193993

24. params = [ 0.15842029 -0.01548538 -0.01632799 -0.07159998] 	 f(x) = 48.7174245544

25. params = [-0.05512618  0.07269238  0.04303647 -0.06725192] 	 f(x) = 8.05081234605

26. params = [-0.05512617  0.07269238  0.04303647 -0.06725192] 	 f(x) = 8.05081192026

27. params = [-0.05512618  0.07269239  0.04303647 -0.06725192] 	 f(x) = 8.0508114712

28. params = [-0.05512618  0.07269238  0.04303648 -0.06725192] 	 f(x) = 8.05081281076

29. params = [-0.05512618  0.07269238  0.04303647 -0.06725191] 	 f(x) = 8.05081206742

30. params = [-0.02149653  0.09172596  0.0267816  -0.06652566] 	 f(x) = 5.0034401551

31. params = [-0.02149652  0.09172596  0.0267816  -0.06652566] 	 f(x) = 5.00344002938

32. params = [-0.02149653  0.09172597  0.0267816  -0.06652566] 	 f(x) = 5.00343999403

33. params = [-0.02149653  0.09172596  0.02678161 -0.06652566] 	 f(x) = 5.00344079846

34. params = [-0.02149653  0.09172596  0.0267816  -0.06652565] 	 f(x) = 5.00343931285

35. params = [ 0.03012534  0.12108955 -0.01347029 -0.04719621] 	 f(x) = 3.70939759272

36. params = [ 0.03012535  0.12108955 -0.01347029 -0.04719621] 	 f(x) = 3.70939819132

37. params = [ 0.03012534  0.12108956 -0.01347029 -0.04719621] 	 f(x) = 3.70939842652

38. params = [ 0.03012534  0.12108955 -0.01347028 -0.04719621] 	 f(x) = 3.70939740919

39. params = [ 0.03012534  0.12108955 -0.01347029 -0.0471962 ] 	 f(x) = 3.70939657281

40. params = [ 0.00300894  0.11308784  0.00337076 -0.03858502] 	 f(x) = 1.57818376805

41. params = [ 0.00300895  0.11308784  0.00337076 -0.03858502] 	 f(x) = 1.578184007

42. params = [ 0.00300894  0.11308785  0.00337076 -0.03858502] 	 f(x) = 1.57818379941

43. params = [ 0.00300894  0.11308784  0.00337077 -0.03858502] 	 f(x) = 1.57818414645

44. params = [ 0.00300894  0.11308784  0.00337076 -0.03858501] 	 f(x) = 1.5781830571

45. params = [-0.01561316  0.12143126  0.00100096 -0.01330919] 	 f(x) = 0.481955650202

46. params = [-0.01561315  0.12143126  0.00100096 -0.01330919] 	 f(x) = 0.481955273258

47. params = [-0.01561316  0.12143127  0.00100096 -0.01330919] 	 f(x) = 0.481955840221

48. params = [-0.01561316  0.12143126  0.00100097 -0.01330919] 	 f(x) = 0.481955839908

49. params = [-0.01561316  0.12143126  0.00100096 -0.01330918] 	 f(x) = 0.481955373719

50. params = [-0.00041042  0.12401147 -0.00485423  0.00207093] 	 f(x) = 0.0681301442419

51. params = [-0.00041041  0.12401147 -0.00485423  0.00207093] 	 f(x) = 0.0681301690496

52. params = [-0.00041042  0.12401148 -0.00485423  0.00207093] 	 f(x) = 0.0681301089522

53. params = [-0.00041042  0.12401147 -0.00485422  0.00207093] 	 f(x) = 0.0681299235971

54. params = [-0.00041042  0.12401147 -0.00485423  0.00207094] 	 f(x) = 0.0681302701249

55. params = [-0.00144057  0.12278377 -0.00124594 -0.00264166] 	 f(x) = 0.0134639451888

56. params = [-0.00144056  0.12278377 -0.00124594 -0.00264166] 	 f(x) = 0.0134639411113

57. params = [-0.00144057  0.12278378 -0.00124594 -0.00264166] 	 f(x) = 0.0134638572809

58. params = [-0.00144057  0.12278377 -0.00124593 -0.00264166] 	 f(x) = 0.0134639506651

59. params = [-0.00144057  0.12278377 -0.00124594 -0.00264165] 	 f(x) = 0.0134639172132

The callback creates the following list:

[array([ 0.,  0.,  0.,  0.]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166]),
  array([-0.00144057,  0.12278377, -0.00124594, -0.00264166])]

I have reproduced this for completeness, but the important lines are 00, the initial guess, which matches iterate 0 in the output text from the optimizer. Iteration 1 matches any of line 12-14. The parameter vectors are similar to:

[-0.16406864  0.10050261  0.05796848 -0.02971834]

The callback contains this input array at that iteration:

[-0.00144057,  0.12278377, -0.00124594, -0.00264166]

Which would be associated with a cost function 0.013, not 22 and change. As a result, I am unable to trust the tools I have to diagnose a problem with them. The output of the L-BFGS-B print simply is not the value of the cost function for the parameter vector at that iteration. I believe the problem is something like as follows:

The callback is called at the end of a line search before beginning a new gradient hunt
The f= is written at the start of a line search

But I cannot trust the documentation of scipy to be correct that the callback is called at the end of an iteration (~= end of line search in BFGS), and I cannot trust lbfgsb.f to be error free in this way.

Are there plans to improve the optimization module in a “months not years” timespan? Is anyone willing to rewrite lbfgsb.f or even port it to C/C++ or pure python to solve these problems? In commercialized applications, it is often desirable for a user to keep their finger on the pulse by monitoring convergence and potentially stopping optimization, hand tuning the inputs, and then continuing the work. The architecture of optimize.minimize is prohibitive to this. It would be nice to see a context manager based implementation that allows interruption of the optimizer without losing the current state, as this sort of behavior is only possible at the moment through use of a complicated callback function.

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 43 (29 by maintainers)

Most upvoted comments

I think this is not a healthy way to discuss. The license of SciPy clearly mentions the very problem you are addressing.

We are sorry for your experience but we are not obliged to take immediate action to remedy this because an open source library does not operate on the terms you would be expecting as such from a commercial software.

I would really recommend not discussing this further unless there is a code contribution or an isolated issue.

ilayn on Feb 7, 2018

Hi all, can I ask to stop the discussion at this point? It has become unproductive, and has already been reported to the CoC committee at the scipy-conduct email adress. The tone of both the initial post and multiple replies to it, by multiple people, have been unhelpful/inappropriate. The code of conduct committee will follow up with the involved people separately.

rgommers on Feb 8, 2018

@xoviat wrote

SciPy and other open source software isn’t written for you. It’s written for its contributors, and you’re allowed to use it as a side effect.

Please be assured that this is not the opinion of all (or even most, I suspect) SciPy developers. The library is written for the world to use. If it were written only for its “contributors”, I’d stop contributing.

Moreover, by filing this issue, Brandon is contributing. It took a lot of time to create that initial report; he could have simply stopped using SciPy. Feedback–good or bad–is essential, and should not be ignored. Some of the comments may lean more towards “rant” than “bug report”, but try to put yourself in his shoes when he says “This has cost me dozens of hours of debugging time”. You’d probably be ranting too. Please try to filter out the emotional aspects of the report (“great travesty”, etc.) and focus on the defects–big and small–that are being reported.

WarrenWeckesser on Feb 7, 2018

It might be a major problem for you. Every part of SciPy has different problems big or small. But they are not fixed on demands or frustrations. I have worked on Pull Requests that took more than a year. I do still wait for action on some items. Of course they have to be addressed but definitely not by demanding and complaining about a lot of people’s voluntary time and patience.

Those biggest names are not the cavalary you hope them to arrive. You seem to have a point of view that doesn’t agree with the reality of how SciPy works.

ilayn on Feb 7, 2018

There are many issues to unpack here.

There are other issues, e.g. lack of respect for user’s input dtype because of hard coded doubles in the fortran code, for example.

double precision is needed internally when calculating gradients because numerical inaccuracies are created in the gradient otherwise.

the code writes “iterate” instead of “iteration” (a typo). This is an annoyance as it is not very professional, but it does not break anything.

I can assure you that the whole scipy community works very hard to develop this software, in their own free time, doing their best. To me your post comes across as unprofessional. For future information, iterate is well used in this context.

noun: iterate; plural noun: iterates
1.
a quantity arrived at by iteration.

There is an issue with maxiter, reported in #7854, which does need to be corrected, this will eventually be tackled, along with the ~1000 other issues that are destined to be addressed.

As a result, when designing a cost function or debugging code it is desirable to know the state of the optimizer, even without knowing its internal mechanisms, as it walks the space. To do this I do the following: Save this output text from L-BFGS-B, and scrape the f= values from it. Use a callback that stores parameter vectors.

Scraping the output for this purpose is not a robust way to do this. Ideally the callback function should be supplied an OptimizeResult which contains the best x so far, and the function evaluation. This would be a great new feature, but has to take into account back compatibility. In the meantime you should be able to calculate f(x) in the callback function, or even memoize f(x) for given x.

This has cost me dozens of hours of debugging time, only to learn that something is wrong with L-BFGS-B that causes the output of options={‘disp’: True} to not be the the cost function at a given parameter vector.

I’m sorry that you lost dozens of hours on this. Scraping output is not a robust way of doing this.

Are there plans to improve the optimization module in a “months not years” timespan? Is anyone willing to rewrite lbfgsb.f or even port it to C/C++ or pure python to solve these problems?

We continually look for improvements, but this is mostly done in peoples free time. If you are willing to rewrite the codebase, we would welcome a Pull Request.

It would be nice to see a context manager based implementation that allows interruption of the optimizer without losing the current state, as this sort of behavior is only possible at the moment through use of a complicated callback function.

I have plans to implement this for one solver, but gradual evolution is required, that would require a lot of work for all the solvers, and one would want a fairly uniform approach across them all.

andyfaff on Feb 7, 2018

I think this toxicity will not lead to anything fruitful. I am not happy with watching a frequent contributor is being bashed continuously and banned with the prose of "ultimate owner"not having a “place on the table” of the issue etc. And I have to repeat, I don’t even agree with what xoviat said.

For a few times you had the chance to straighten out this towards a neutral tone yet you chose to impose a nonexisting authority. So until you work out this hostile behavior this is on the verge of online bullying which you also threatened a user reporting to GitHub.

That for me crosses the line, I don’t want to participate in any of this any more and will leave the discussion to the board and other willing SciPy members.

ilayn on Feb 8, 2018

@brandondube I understand your frustration, both with the library’s functionality and Xoviat’s comments. At the same time, it helps to keep in mind that everyone here (including yourself, thank you!) is giving their time freely to debug and improve the library. When an issue is titled “scipy.optimize has broken my trust”, it does not engender a very positive feeling from the developers working on it. If we steer towards a more technical description such as “scipy.optimize updates parameters in-place” it takes the sting out, and allows us to focus on the issues at hand.

We generally do not prevent others in the community from participating in our comment threads, even when we disagree with their perspectives (perhaps especially then); it’s useful to hear those opinions, because one of the strengths of open source is finding a balance between opposing views to build code that is robust for many different use cases. I think, in this case, you saw the need to protect yourself from Xoviat’s comments, which felt personal, and I am sorry for that. I cannot make you unblock Xoviat, but I do think it would be better if we could all sit around the table and discuss the issues at hand, with an eye towards improving scipy.optimize.

Thank you for giving your time to benchmark and to evaluate the code, and for helping us make a useful submodule even more so. I hope we can address each of the issues you raised here to your satisfaction and in a timely manner.

stefanv on Feb 8, 2018

This issue isn’t a bug report. It’s an attempt to claim the time of others to work on issues that they themselves have decided not to prioritize:

I am tagging two heavy hitters to attempt to bring discussion to this issue since, in general, scipy.optimize issues seem to go without discussion or fixes. @pv @teoliphant

ghost on Feb 7, 2018