astropy: Knuth’s rule fails with simple and small array (eats up system's memory)

Description

The knuth_bin_width is not able to handle a small and simple array.

Expected behavior

A histogram should be generated, or an error shown explaining why it was not possible to obtain it.

Actual behavior

The function starts to gobble up the system’s memory.

Steps to Reproduce

import numpy as np
import matplotlib.pyplot as plt
from astropy.visualization import hist
arr = np.array([0.05555556, 0. , 0. , 0. , 0. ,1. , 0. , 0. , 0. , 0.5 ])
ax = plt.subplot(111)
hist(arr, bins='knuth', ax=ax)

System Details

Linux-5.5.0-050500-generic-x86_64-with-glibc2.10
>>> Python 3.8.8 (default, Feb 24 2021, 21:46:12) 
[GCC 7.3.0]
>>> Numpy 1.19.2
>>> astropy 4.2
>>> Scipy 1.5.2
>>> Matplotlib 3.3.1

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

@Gabriel-p Even if that is case, I think it’s better than the current code which tries to optimize a highly nonconvex function that can possibly have no minimum. A grid search with a sensible choice for the upper-bound on the number of bins seems a reasonable solution. What do you think?

PS: The max number of bins could even be an input parameter with some sensible default.

Because the issue happens inside the optimize call. That’s where M grows without bound

You can get 4.3.dev by installing the development version from master. It does take a while to compute. Didn’t eat up all my machine’s memory, so the eating does stop at some point. I think this might actually be a bug in:

https://github.com/astropy/astropy/blob/6fd98e528ee59d2b7d9b932946a39e199221360d/astropy/stats/histogram.py#L16

cc @larrybradley