scipy: Why doesn't “beta.fit” come out right?Please take a look at it for me
distribution_type = getattr(stats, 'beta') # 共轭分布为beta分布,参考《Introduction to Empirical Bayes》
observed = pd.read_excel('retention.xlsx')
print(observed['and_retention'].values)
a, b = distribution_type.fit(observed['and_retention'].values)[:2]
print(a, b, sep='\n')
ax = plt.subplot(111)
ax.plot(np.linspace(0, 1, 100), stats.beta.pdf(np.linspace(0, 1, 100), a, b))
ax.hist(observed['and_retention'].values, alpha=0.75, color='green', bins=104)
plt.show()
“print” out:
[0.294 0.2955 0.235 0.2536 0.2423 0.2844 0.2099 0.2355 0.2946 0.3388 0.2202 0.2523 0.2209 0.2707 0.1885 0.2414 0.2846 0.328 0.2265 0.2563 0.2345 0.2845 0.1787 0.2392 0.2777 0.3076 0.2108 0.2477 0.234 0.2696 0.1839 0.2344 0.2872 0.3224 0.2152 0.2593 0.2295 0.2702 0.1876 0.2331 0.2809 0.3316 0.2099 0.2814 0.2174 0.2516 0.2029 0.2282 0.2697 0.3424 0.2259 0.2626 0.2187 0.2502 0.2161 0.2194 0.2628 0.3296 0.2323 0.2557 0.2215 0.2383 0.2166 0.2315 0.2757 0.3163 0.2311 0.2479 0.2199 0.2418 0.1938 0.2394 0.2718 0.3297 0.2346 0.2523 0.2262 0.2481 0.2118 0.241 0.271 0.3525 0.2323 0.2513 0.2313 0.2476 0.232 0.2295 0.2645 0.3386 0.2334 0.2631 0.226 0.2603 0.2334 0.2375 0.2744 0.3491 0.2052 0.2473 0.228 0.2448 0.2189 0.2149] 6.056697373013153 409078.57804704335
The α and β is out of whack(α=6.056697373013153,β=409078.57804704335) The fitting image is also unreasonable. The fitted beta distribution is almost the same as the straight line
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (9 by maintainers)
The
beta
distribution has four parameters: alpha, beta, location and scale. In the lineyou discarded the location and scale parameters. When you computed the PDF with the expression
you did not specify the location and scale parameters, so the default values of 0 and 1 (respectively) were used. So it is no surprise that the plot of the PDF does not match the histogram.
Try changing the
fit
call toand then change the computation of the PDF to use
A separate problem is that, in order for the histogram’s vertical scale to match the PDF, you need to use the argument
density=True
:I suspect you’ll want to follow the alternative suggestion that I made in my second comment: use the standard beta distribution with location 0 and scale 1. Here’s a modified version of your script that does that:
The script prints
and generates the following plot: