astropy: hist breaks when range of input data is very large

When hist is given a large range of values and the freedman bin method is chosen memory use grows rapidly, (15GB when I killed the kernel after a few minutes), CPU usage jumps to 100% and the histogram is never plotted even if the number of data points is small.

This case below reproduces the behavior on my computer; I don’t think there is anything special about the set of numbers below, just that the range needs to be large. Note that only 10 points need to be histogrammed.

import numpy as np

from astropy.visualization import hist

data = [ 9.99999914e+05, -8.31312483e-03,  6.52755852e-02,  1.43104653e-03,
             -2.26311017e-02,  2.82660007e-03,  1.80307521e-02,  9.26294279e-03,
             5.06606026e-02,  2.05418011e-03]
hist(data, bins='freedman')

I’ve seen this issue in both astropy 2.0.8 and 3.0.4 both in Python 3.6.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 21 (21 by maintainers)

Most upvoted comments

Oops, thanks for letting me know. Accidentally tagged the wrong sub package initially too 🙄

@abhinuvpitale - The bins are computed by astropy and passed as those large arrays to mpl.

I can take a look over the weekend at that – though the test case I provided in the issue is not realistic, I hit the issue initially with actual image data. I think it is useful for us to generate a warning with some suggestions for avoiding the problem.

Oh, I meant to include a link to the docs here, I think it’s covered there already, but @mwcraig can confirm whether he has seen this page or not:

http://docs.astropy.org/en/stable/visualization/histogram.html

Given that the problematic data points are very much outlying artefacts rather originate from the underlying distribution of data, the solution for the case above probably would be to either set the range kwarg manually, or sigma clip the data before histograming it. We may consider adding a new optional kwarg for it, but I feel it would open more cans of worms than issues it solve.