scipy: Examples missing import numpy as np?

Hi

a bunch of examples seem to be missing import numpy as np. At least for me they throw NameError: name 'np' is not defined.

Following examples seem to be missing import numpy as np (non exhaustive):

  • scipy.signal.spectrogram
  • scipy.signal.periodogram
  • scipy.signal.welch

scipy.__version__ gives 1.4.1. However, the examples seem to be the same in the current master.

Maybe I am missing something? Any help is appreciated! Thanks 😃

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 48 (43 by maintainers)

Commits related to this issue

Most upvoted comments

I don’t have a horse in this race, but I wanted to mention an alternative that came up on https://github.com/networkx/networkx/issues/5720

We could ask IPython/Jupyter whether they could detect common missing names and give an error message after a NameError with some guidance, such as:

Hint: it looks like you are missing an import; to use SciPy, try `import scipy as sp` first

This is a best-of-both-worlds type scenario, where we don’t need to repeat the same imports over and over in the docstrings, but users get helpful feedback for common package imports (np, sp, plt, pd, nx, etc).

Some docstrings have multiple examples, but these are often interspersed with lots of text. Should a numpy import happen once in a docstring, or with each example (as previous comments suggest)? The latter is bloat-worthy.

numpy is ubiquitous, so knowing what np.sum means without the import being included in a docstring doesn’t seem like an egregious act. Even then it’s only a websearch away. A single websearch is small fry compared to ā€˜ugliness’ caused by adding an explicit numpy import.

Besides, there are inline code snippets in multiple places that implicitly use np: np.std(pop) <= atol + tol * np.abs(np.mean(population_energies)) is present in the docstring, do they deserve an import?

In a typical interpreter session (Python/IPython) you only need to do the import once (18 characters worth), and it lasts for the duration of the session. I would argue that the cost of typing that is miniscule compared to the cost incurred by adding the numpy import to every example.

And I can review those PRs, @tupui, or we can tag-team it.

Given that we had two new PRs adding import numpy as np in a single example, I think it’s time to make a final decision here and close the issue. I just read through the discussion here, and till now there:

The arguments for not including the import have never seemed very compelling, but issues like this one have rarely come up, so I guess it is rare for users to get tripped up by having np not defined explicitly.

This seems like a decent summary to me.

If we’d started from scratch I’d be ~ +0 too I think, however now that we have had this policy for 10+ years with very few complaints, I’d say we should not change it now - there’s little benefit, and a reasonable amount of churn plus adding ~5,000 LoC. So I propose we close this with a decision that we won’t change the current implicit/assumed import.

There is enough negative feedback that I suspect we won’t do this, but if we did, I would suggest we fix all the docstrings at once.

Whatever the policy it, we should not make it optional–either we require the import, or we require that it not be used. Making it optional would create an unnecessary inconsistency in the docstrings.

Completely agreed with this.

If we get a reasonable number of developers, it would be nice to have a vote.

Just to nitpick a little–for substantive decisions probably best to ā€œvoteā€ by mailing list or similar rather than ā€œpeople who happen to be at the meeting.ā€ Time issues aside, some devs simply feel more comfortable expressing their views outside of live meetings, etc.

-1 from me.

Our current default is a single assumed import numpy as np. No more, no plt, no pd etc etc. Therefore the main premise of https://github.com/scipy/scipy/issues/13049#issuecomment-1146333767 is not really applicable, there is no gazillion of aliases to know by heart or agonizingly discover. So for a novice user I won’t buy that it leads to hours of confusion. Even if there is some, it’s a one-timer. From that point on, this import is a line noise, repeated 1000s of times in all docstring examples for all users.

EDIT:

A middle ground might be to come up with another rubric trick such that it includes a code block with the necessary imports at > the documentation generation step and not necessarily putting the import statement into the docstring.

Please, no 😃. More tooling, more docs, more maintenance, more magic, is it really worth it.

I’m +0. The arguments for not including the import have never seemed very compelling, but issues like this one have rarely come up, so I guess it is rare for users to get tripped up by having np not defined explicitly.

There is enough negative feedback that I suspect we won’t do this, but if we did, I would suggest we fix all the docstrings at once. It is an easy (if tedious) change that adds just one line to any given docstring.

Whatever the policy it, we should not make it optional–either we require the import, or we require that it not be used. Making it optional would create an unnecessary inconsistency in the docstrings.

At least would be better than ML imho. We needed to have some issues converted to discussions anyways so if not for this then for other things we’ll need it anyways.

Also -1 on the explicit import for docstrings.

Our assumption is that np is a well-established shortcut for numpy and evident that it is required. But it is an assumption.

This allows us skip a few hundreds of identical lines for each example in the catalogue.

Ahh, okay. Thanks for clearing that up for me.

I am not sure. But when trying to ā€œreproduceā€ the code it would be helpful to have all needed imports at the top. So that one, just as you stated, can just copy and paste the code.

As of right now, you either have to go through the example code and look for all used dependencies or run it once in order to get potential error messages.

@tupui also happy to work on this if/when the decision is made and if required

I’m now +0.5

In general I think it’s unnecessary, when I copy examples then it is usually in an already active notebook where numpy is already imported as np.

However, many users in stats will be mainly using pandas and might be less familiar with with working directly with numpy. For those users having the explicit import in the example will be useful, and advertise numpy.

I checked several docstrings in stats. Most of them are long enough that an additional import will not distract much.

@melissawm thought this might be of interest to the contributor experience leads since this comes up every few months as a sticking point for new users. Others have offered to make the actual adjustments to the documentation, but we could use help coming to an agreement. (If you think it’s out of scope, though, I understand.)

To be fair, IDEs can be set up to automatically perform the imports when a console is launched. And as noted above, the beginning of the tutorial introduction has: image

But I only set up my IDE that way recently, so I don’t imagine that beginners would know/think to do that. And not everyone sees, reads the tutorials from the very beginning.

I agree it’s more efficient to print it in exactly one place, and it’s more convenient to set up the IDE to do the imports automatically, but I think it’s more convenient for users to have it in every example. We do a lot of things for user convenience, after all.

If we’re trying to close this issue one way or the other, I’d suggest posting on the mailing list one more time. ā€œ5 maintainers against importsā€ could actually mean 5-15 maintainers against, but ā€œ2 users requesting importsā€ could mean 2000-200000 users for imports.

@tylerjereddy @rgommers @ev-br Could you enable the discussions tab? Look for the ā€œDiscussionsā€ checkbox in the repository settings page. I think we can use that instead of the ML which I also find a bit strange for discussions.

We had recently a discussion on the mail list about improving the workflow and said that Discussions do not provide sufficient moderating tools. Maybe discord or slack would be more appropriate?

@tylerjereddy @rgommers @ev-br Could you enable the discussions tab? Look for the ā€œDiscussionsā€ checkbox in the repository settings page. I think we can use that instead of the ML which I also find a bit strange for discussions.

If I put my PM hat on and assess the priorities, this is probably at the bottom of our priority list. Even a full PEP8 compliance cleanup is above this so if we are going to touch lots of files I am -1 on this being the reason because evidently we had +0. issues regarding this, meaning that the need for this is just noise. So newcomer argument is not a very convincing one for me.

If we want to put this in while we are fixing something else then I am -0. No strong opinions though. I think everybody knows what np is at this stage. The examples are not self-contained yes but it would hardly be the case that the numpy is not imported already in the namespace given the fact that everyone is using ipython or notebooks already and IDE folks can see it right away.

Another anecdotal evidence I just checked, everyone I know who uses spyder around me, enables the auto import of numpy scipy plt and pandas and alike. Here is mine

image

Plus it’s difficult for a lot of devs to make meetings due to time zone issues.

@tupui what do you think about having this as a topic at the meetup? If we get a reasonable number of developers, it would be nice to have a vote.

Agreed, I am adding this to the agenda, thanks @mdhaber. In general, I plan to go through the needs-decision label as you suggested.

I am also leaning towards having self-contained examples. There would be other fancy ways to tackle this otherwise like having a copy-button and this would prepend the code with the imports. But I prefer clarity over magic. I am not a fan either to direct the user to another page (again) to see what are the imported modules.

@rajpratyush I’m not sure I follow. This issue is not about installing dependencies. It’s about docstrings assuming that import numpy as np has already been executed.

@tupui what do you think about having this as a topic at the meetup? If we get a reasonable number of developers, it would be nice to have a vote.

  1. Should import numpy as np allowed in docstring examples? If so,
  2. Is it recommended practice for new PRs (e.g. new functions and PRs that involve updates to docstrings)? If so,
  3. Do we want a mass PR adding it to old examples?

No, this is not your fault; it’s common throughout much of the documentation.

I think that the examples should include complete code that can be copy-pasted into a console, but other developers argue that it should be omitted since it is common to almost every example.

As a user, what do you think?