scipy: scipy.optimize has broken my trust.
This submodule has broken my trust.
I am genuinely in awe of how it was included in scipy 1.0. I am tagging two heavy hitters to attempt to bring discussion to this issue since, in general, scipy.optimize issues seem to go without discussion or fixes. @pv @teoliphant
As I understand, an effort was made in the last few version of scipy leading up to and including 1.0 to unify the interface of the different optimization routines and eventually deprecate fmin_xxxx.
A fairly large issue with this is that many of the inputs and outputs in the scipy documentation for these functions are simply wrong.
Take, for example the BFGS method.
the documentation for disp is incorrect; convergence messages are not printed, only a blurb at the end. return_all appears in the call signature at the top and is undocumented. I looked to fmin_bfgs hoping its more specific documentation would be superior. The documentation for that is even more misleading. I hoped that the specified type for the constituents of allvecs being OptimizationResult would mean they hold cost function values and input vector values. They are simply input vector values. The documentation here is also misleading.
I also understand that many of the algorithms are wrapped code in another language, e.g. L-BFGS-B is written in FORTRAN77, and this may make their modification difficult. Earlier today I opened an issue that L-BFGS-B does not respect the maxiter input, #8372. There are other issues, e.g. lack of respect for user’s input dtype because of hard coded doubles in the fortran code, for example.
However, the docs being misleading is a great travesty for the users of this module, which I believe is among the more important/widely used of scipy. I would work to improve them myself, but I do not feel that my knowledge of the inner workings of the module is great enough to do so without myself making mistakes.
Returning to L-BFGS-B, the wrapped fortran code appears to contain a variety of errors or behaviors that have a varied level of harmfulness. For example, this is an example output:
RUNNING THE L-BFGS-B CODE
* * *
Machine precision = 2.220D-16
N = 4 M = 10
At X0 0 variables are exactly at the bounds
At iterate 0 f= 2.75370D+01 |proj g|= 1.06893D+02
At iterate 1 f= 2.27149D+01 |proj g|= 1.76623D+02
At iterate 2 f= 1.20196D+01 |proj g|= 1.20733D+02
At iterate 3 f= 8.05081D+00 |proj g|= 8.74848D+01
At iterate 4 f= 5.00344D+00 |proj g|= 8.42253D+01
At iterate 5 f= 3.70940D+00 |proj g|= 1.01991D+02
At iterate 6 f= 1.57818D+00 |proj g|= 7.10948D+01
At iterate 7 f= 4.81956D-01 |proj g|= 3.76945D+01
At iterate 8 f= 6.81301D-02 |proj g|= 2.20645D+01
At iterate 9 f= 1.34639D-02 |proj g|= 8.79078D+00
* * *
Tit = total number of iterations
Tnf = total number of function evaluations
Tnint = total number of segments explored during Cauchy searches
Skip = number of BFGS updates skipped
Nact = number of active bounds at final generalized Cauchy point
Projg = norm of the final projected gradient
F = final function value
* * *
N Tit Tnf Tnint Skip Nact Projg F
4 9 12 1 0 0 8.791D+00 1.346D-02
F = 1.346394518877836E-002
CONVERGENCE: REL_REDUCTION_OF_F_<=_FACTR*EPSMCH
Cauchy time 0.000E+00 seconds.
Subspace minimization time 0.000E+00 seconds.
Line search time 0.000E+00 seconds.
Total User time 0.000E+00 seconds.
the code writes “iterate” instead of “iteration” (a typo). This is an annoyance as it is not very professional, but it does not break anything. The separate timings for Cauchy, subspace minimization, and line search are also always zero. Same for user time. It would be nice for these to not be a filler values since this is misleading, but same thing - this is creature comfort problem, not an accuracy problem.
My application is in a topic known as wavefront sensing, or phase retrieval. We put in some number of parameters, and the landscape of the cost function is highly evolutionary as the optimizer traverses the space. As a result, when designing a cost function or debugging code it is desirable to know the state of the optimizer, even without knowing its internal mechanisms, as it walks the space. To do this I do the following:
- Save this output text from L-BFGS-B, and scrape the f= values from it.
- Use a callback that stores parameter vectors.
I expect these two things to have the same length always, and depend on cost[n] being associated with all_x[n] without registration errors between these two blocks of data.
In my particular application (I can share code on request, but 1000-2000 lines are called to implement my routine split across a few modules; there is no easy “def foo()” analog that I have found to reproduce this) I am having trouble understanding why my cost function appears to not be correlated, based on the output of l-bfgs-b, to the closeness of the current iteration input to truth:
This has cost me dozens of hours of debugging time, only to learn that something is wrong with L-BFGS-B that causes the output of options={‘disp’: True} to not be the the cost function at a given parameter vector.
Because I capture stdout, I am not able to use print. I have embedded a line in my cost function that appends the current parameter vector and cost to a .txt file and resorted to matching values in that to L-BFGS-B print output to see what is wrong. The text from that is below, with numbered lines
00. params = [ 0. 0. 0. 0.] f(x) = 27.5369873872
01. params = [ 1.00000000e-08 0.00000000e+00 0.00000000e+00 0.00000000e+00] f(x) = 27.5369884562
02. params = [ 0.00000000e+00 1.00000000e-08 0.00000000e+00 0.00000000e+00] f(x) = 27.5369867324
03. params = [ 0.00000000e+00 0.00000000e+00 1.00000000e-08 0.00000000e+00] f(x) = 27.5369870096
04. params = [ 0.00000000e+00 0.00000000e+00 0.00000000e+00 1.00000000e-08] f(x) = 27.5369875808
05. params = [-0.80769268 0.49476373 0.28537275 -0.14630029] f(x) = 86.4418490547
06. params = [-0.80769267 0.49476373 0.28537275 -0.14630029] f(x) = 86.4418490135
07. params = [-0.80769268 0.49476374 0.28537275 -0.14630029] f(x) = 86.441849366
08. params = [-0.80769268 0.49476373 0.28537276 -0.14630029] f(x) = 86.4418499781
09. params = [-0.80769268 0.49476373 0.28537275 -0.14630028] f(x) = 86.4418482602
10. params = [-0.16406864 0.1005026 0.05796848 -0.02971834] f(x) = 22.714940612
11. params = [-0.16406863 0.1005026 0.05796848 -0.02971834] f(x) = 22.7149388457
12. params = [-0.16406864 0.10050261 0.05796848 -0.02971834] f(x) = 22.7149405671
13. params = [-0.16406864 0.1005026 0.05796849 -0.02971834] f(x) = 22.7149404753
14. params = [-0.16406864 0.1005026 0.05796848 -0.02971833] f(x) = 22.7149421094
15. params = [-0.10087717 0.09158391 0.05575493 -0.06632038] f(x) = 12.0195875306
16. params = [-0.10087716 0.09158391 0.05575493 -0.06632038] f(x) = 12.0195863232
17. params = [-0.10087717 0.09158392 0.05575493 -0.06632038] f(x) = 12.0195878653
18. params = [-0.10087717 0.09158391 0.05575494 -0.06632038] f(x) = 12.0195883001
19. params = [-0.10087717 0.09158391 0.05575493 -0.06632037] f(x) = 12.0195867159
20. params = [ 0.15842029 -0.01548538 -0.01632799 -0.07159999] f(x) = 48.7174210299
21. params = [ 0.1584203 -0.01548538 -0.01632799 -0.07159999] f(x) = 48.7174233933
22. params = [ 0.15842029 -0.01548537 -0.01632799 -0.07159999] f(x) = 48.717421743
23. params = [ 0.15842029 -0.01548538 -0.01632798 -0.07159999] f(x) = 48.7174193993
24. params = [ 0.15842029 -0.01548538 -0.01632799 -0.07159998] f(x) = 48.7174245544
25. params = [-0.05512618 0.07269238 0.04303647 -0.06725192] f(x) = 8.05081234605
26. params = [-0.05512617 0.07269238 0.04303647 -0.06725192] f(x) = 8.05081192026
27. params = [-0.05512618 0.07269239 0.04303647 -0.06725192] f(x) = 8.0508114712
28. params = [-0.05512618 0.07269238 0.04303648 -0.06725192] f(x) = 8.05081281076
29. params = [-0.05512618 0.07269238 0.04303647 -0.06725191] f(x) = 8.05081206742
30. params = [-0.02149653 0.09172596 0.0267816 -0.06652566] f(x) = 5.0034401551
31. params = [-0.02149652 0.09172596 0.0267816 -0.06652566] f(x) = 5.00344002938
32. params = [-0.02149653 0.09172597 0.0267816 -0.06652566] f(x) = 5.00343999403
33. params = [-0.02149653 0.09172596 0.02678161 -0.06652566] f(x) = 5.00344079846
34. params = [-0.02149653 0.09172596 0.0267816 -0.06652565] f(x) = 5.00343931285
35. params = [ 0.03012534 0.12108955 -0.01347029 -0.04719621] f(x) = 3.70939759272
36. params = [ 0.03012535 0.12108955 -0.01347029 -0.04719621] f(x) = 3.70939819132
37. params = [ 0.03012534 0.12108956 -0.01347029 -0.04719621] f(x) = 3.70939842652
38. params = [ 0.03012534 0.12108955 -0.01347028 -0.04719621] f(x) = 3.70939740919
39. params = [ 0.03012534 0.12108955 -0.01347029 -0.0471962 ] f(x) = 3.70939657281
40. params = [ 0.00300894 0.11308784 0.00337076 -0.03858502] f(x) = 1.57818376805
41. params = [ 0.00300895 0.11308784 0.00337076 -0.03858502] f(x) = 1.578184007
42. params = [ 0.00300894 0.11308785 0.00337076 -0.03858502] f(x) = 1.57818379941
43. params = [ 0.00300894 0.11308784 0.00337077 -0.03858502] f(x) = 1.57818414645
44. params = [ 0.00300894 0.11308784 0.00337076 -0.03858501] f(x) = 1.5781830571
45. params = [-0.01561316 0.12143126 0.00100096 -0.01330919] f(x) = 0.481955650202
46. params = [-0.01561315 0.12143126 0.00100096 -0.01330919] f(x) = 0.481955273258
47. params = [-0.01561316 0.12143127 0.00100096 -0.01330919] f(x) = 0.481955840221
48. params = [-0.01561316 0.12143126 0.00100097 -0.01330919] f(x) = 0.481955839908
49. params = [-0.01561316 0.12143126 0.00100096 -0.01330918] f(x) = 0.481955373719
50. params = [-0.00041042 0.12401147 -0.00485423 0.00207093] f(x) = 0.0681301442419
51. params = [-0.00041041 0.12401147 -0.00485423 0.00207093] f(x) = 0.0681301690496
52. params = [-0.00041042 0.12401148 -0.00485423 0.00207093] f(x) = 0.0681301089522
53. params = [-0.00041042 0.12401147 -0.00485422 0.00207093] f(x) = 0.0681299235971
54. params = [-0.00041042 0.12401147 -0.00485423 0.00207094] f(x) = 0.0681302701249
55. params = [-0.00144057 0.12278377 -0.00124594 -0.00264166] f(x) = 0.0134639451888
56. params = [-0.00144056 0.12278377 -0.00124594 -0.00264166] f(x) = 0.0134639411113
57. params = [-0.00144057 0.12278378 -0.00124594 -0.00264166] f(x) = 0.0134638572809
58. params = [-0.00144057 0.12278377 -0.00124593 -0.00264166] f(x) = 0.0134639506651
59. params = [-0.00144057 0.12278377 -0.00124594 -0.00264165] f(x) = 0.0134639172132
The callback creates the following list:
[array([ 0., 0., 0., 0.]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166]),
array([-0.00144057, 0.12278377, -0.00124594, -0.00264166])]
I have reproduced this for completeness, but the important lines are 00, the initial guess, which matches iterate 0 in the output text from the optimizer. Iteration 1 matches any of line 12-14. The parameter vectors are similar to:
[-0.16406864 0.10050261 0.05796848 -0.02971834]
The callback contains this input array at that iteration:
[-0.00144057, 0.12278377, -0.00124594, -0.00264166]
Which would be associated with a cost function 0.013, not 22 and change. As a result, I am unable to trust the tools I have to diagnose a problem with them. The output of the L-BFGS-B print simply is not the value of the cost function for the parameter vector at that iteration. I believe the problem is something like as follows:
- The callback is called at the end of a line search before beginning a new gradient hunt
- The f= is written at the start of a line search
But I cannot trust the documentation of scipy to be correct that the callback is called at the end of an iteration (~= end of line search in BFGS), and I cannot trust lbfgsb.f to be error free in this way.
Are there plans to improve the optimization module in a “months not years” timespan? Is anyone willing to rewrite lbfgsb.f or even port it to C/C++ or pure python to solve these problems? In commercialized applications, it is often desirable for a user to keep their finger on the pulse by monitoring convergence and potentially stopping optimization, hand tuning the inputs, and then continuing the work. The architecture of optimize.minimize is prohibitive to this. It would be nice to see a context manager based implementation that allows interruption of the optimizer without losing the current state, as this sort of behavior is only possible at the moment through use of a complicated callback function.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 43 (29 by maintainers)
I think this is not a healthy way to discuss. The license of SciPy clearly mentions the very problem you are addressing.
We are sorry for your experience but we are not obliged to take immediate action to remedy this because an open source library does not operate on the terms you would be expecting as such from a commercial software.
I would really recommend not discussing this further unless there is a code contribution or an isolated issue.
Hi all, can I ask to stop the discussion at this point? It has become unproductive, and has already been reported to the CoC committee at the
scipy-conduct
email adress. The tone of both the initial post and multiple replies to it, by multiple people, have been unhelpful/inappropriate. The code of conduct committee will follow up with the involved people separately.@xoviat wrote
Please be assured that this is not the opinion of all (or even most, I suspect) SciPy developers. The library is written for the world to use. If it were written only for its “contributors”, I’d stop contributing.
Moreover, by filing this issue, Brandon is contributing. It took a lot of time to create that initial report; he could have simply stopped using SciPy. Feedback–good or bad–is essential, and should not be ignored. Some of the comments may lean more towards “rant” than “bug report”, but try to put yourself in his shoes when he says “This has cost me dozens of hours of debugging time”. You’d probably be ranting too. Please try to filter out the emotional aspects of the report (“great travesty”, etc.) and focus on the defects–big and small–that are being reported.
It might be a major problem for you. Every part of SciPy has different problems big or small. But they are not fixed on demands or frustrations. I have worked on Pull Requests that took more than a year. I do still wait for action on some items. Of course they have to be addressed but definitely not by demanding and complaining about a lot of people’s voluntary time and patience.
Those biggest names are not the cavalary you hope them to arrive. You seem to have a point of view that doesn’t agree with the reality of how SciPy works.
There are many issues to unpack here.
double precision is needed internally when calculating gradients because numerical inaccuracies are created in the gradient otherwise.
I can assure you that the whole scipy community works very hard to develop this software, in their own free time, doing their best. To me your post comes across as unprofessional. For future information,
iterate
is well used in this context.There is an issue with maxiter, reported in #7854, which does need to be corrected, this will eventually be tackled, along with the ~1000 other issues that are destined to be addressed.
Scraping the output for this purpose is not a robust way to do this. Ideally the callback function should be supplied an
OptimizeResult
which contains the bestx
so far, and the function evaluation. This would be a great new feature, but has to take into account back compatibility. In the meantime you should be able to calculate f(x) in the callback function, or even memoize f(x) for givenx
.I’m sorry that you lost dozens of hours on this. Scraping output is not a robust way of doing this.
We continually look for improvements, but this is mostly done in peoples free time. If you are willing to rewrite the codebase, we would welcome a Pull Request.
I have plans to implement this for one solver, but gradual evolution is required, that would require a lot of work for all the solvers, and one would want a fairly uniform approach across them all.
I think this toxicity will not lead to anything fruitful. I am not happy with watching a frequent contributor is being bashed continuously and banned with the prose of "ultimate owner"not having a “place on the table” of the issue etc. And I have to repeat, I don’t even agree with what xoviat said.
For a few times you had the chance to straighten out this towards a neutral tone yet you chose to impose a nonexisting authority. So until you work out this hostile behavior this is on the verge of online bullying which you also threatened a user reporting to GitHub.
That for me crosses the line, I don’t want to participate in any of this any more and will leave the discussion to the board and other willing SciPy members.
@brandondube I understand your frustration, both with the library’s functionality and Xoviat’s comments. At the same time, it helps to keep in mind that everyone here (including yourself, thank you!) is giving their time freely to debug and improve the library. When an issue is titled “scipy.optimize has broken my trust”, it does not engender a very positive feeling from the developers working on it. If we steer towards a more technical description such as “scipy.optimize updates parameters in-place” it takes the sting out, and allows us to focus on the issues at hand.
We generally do not prevent others in the community from participating in our comment threads, even when we disagree with their perspectives (perhaps especially then); it’s useful to hear those opinions, because one of the strengths of open source is finding a balance between opposing views to build code that is robust for many different use cases. I think, in this case, you saw the need to protect yourself from Xoviat’s comments, which felt personal, and I am sorry for that. I cannot make you unblock Xoviat, but I do think it would be better if we could all sit around the table and discuss the issues at hand, with an eye towards improving
scipy.optimize
.Thank you for giving your time to benchmark and to evaluate the code, and for helping us make a useful submodule even more so. I hope we can address each of the issues you raised here to your satisfaction and in a timely manner.
This issue isn’t a bug report. It’s an attempt to claim the time of others to work on issues that they themselves have decided not to prioritize: