python-rasterstats: Wrong mean/std stats

Hi,

I think there is an error in calculating mean and standard deviation in rasterstats 0.10.3 Here is the statistics calculated in ArcMap for my region:
Band COUNT MIN MAX MEAN STD MEDIAN 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 51615 3675 52475 10 151057 14939 44531 37989 1722 38133 11 151057 10063 56708 48523 3707 49289 12 151057 13024 46146 34272 5463 34959 13 151057 23169 59005 46253 3292 46804

And here is the same statistics calculated using rasterstats (have a look at mean and std values for bands 9-13):
Band count min max mean std percentile_50 1 151057 1378 2399 1584 71 1577 2 151057 1025 2753 1318 119 1305 3 151057 594 3271 1035 191 1014 4 151057 380 3636 854 223 826 5 151057 113 4225 753 252 718 6 151057 366 4869 1853 642 1808 7 151057 153 7181 2947 1099 2873 8 151057 121 5926 2577 916 2519 9 151057 13880 59141 23182 28669 52475 10 151057 14939 44531 9556 28485 38133 11 151057 10063 56708 20090 28673 49289 12 151057 13024 46146 5839 28953 34959 13 151057 23169 59005 17820 28623 46804

The code to reproduce the results using rasterstats: for i in range(1, 14): stats = zonal_stats('F:/test.shp', "F:/test_2.dat", stats=['count','min', 'max', 'mean', 'std', 'percentile_50'], all_touched=False, band_num=i, nodata=0)

Here is the data: https://www.dropbox.com/sh/e8g0jwr1ci0vp2r/AAATlwU2eFjtbtMCO3R5SB1Fa?dl=0

Can you please let me know what the problem could be?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 17 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I gotchu fam. Here are my data that expose the bug. https://drive.google.com/open?id=1mgWIVUM0HkOFNMmcm_fBhDaNy1tkfFa7

Without dtype=np.float: [{‘count’: 22625585, ‘sum’: 1396284338.0}, {‘count’: 212501, ‘sum’: 25293972.0}, {‘count’: 305003, ‘sum’: 25457696.0}, {‘count’: 221494, ‘sum’: 20968219.0}, {‘count’: 1649505, ‘sum’: 362830813.0}, {‘count’: 205705, ‘sum’: 13864265.0}, {‘count’: 439389, ‘sum’: 40465001.0}, {‘count’: 555130, ‘sum’: 43869294.0}, {‘count’: 554943, ‘sum’: 58234289.0}, {‘count’: 192363, ‘sum’: 11353189.0}, {‘count’: 236805, ‘sum’: 15387852.0}, {‘count’: 121387, ‘sum’: 6349592.0}, {‘count’: 1472733, ‘sum’: 160577428.0}, {‘count’: 306673, ‘sum’: 13142182.0}, {‘count’: 220197, ‘sum’: 9357941.0}, {‘count’: 356920, ‘sum’: 13157448.0}, {‘count’: 140968, ‘sum’: 4057913.0}, {‘count’: 138541, ‘sum’: 3196803.0}, {‘count’: 153863, ‘sum’: 2930441.0}]

With dtype=np.float [{‘count’: 22625585, ‘sum’: 31461055410.0}, {‘count’: 212501, ‘sum’: 25293972.0}, {‘count’: 305003, ‘sum’: 25457696.0}, {‘count’: 221494, ‘sum’: 20968219.0}, {‘count’: 1649505, ‘sum’: 362830813.0}, {‘count’: 205705, ‘sum’: 13864265.0}, {‘count’: 439389, ‘sum’: 40465001.0}, {‘count’: 555130, ‘sum’: 43869294.0}, {‘count’: 554943, ‘sum’: 58234289.0}, {‘count’: 192363, ‘sum’: 11353189.0}, {‘count’: 236805, ‘sum’: 15387852.0}, {‘count’: 121387, ‘sum’: 6349592.0}, {‘count’: 1472733, ‘sum’: 160577428.0}, {‘count’: 306673, ‘sum’: 13142182.0}, {‘count’: 220197, ‘sum’: 9357941.0}, {‘count’: 356920, ‘sum’: 13157448.0}, {‘count’: 140968, ‘sum’: 4057913.0}, {‘count’: 138541, ‘sum’: 3196803.0}, {‘count’: 153863, ‘sum’: 2930441.0}]

Output of QGIS (agrees with “with dtype”) image

[I am not getting the large negative numbers for this shapefile, although I can upload that one if you want.]

I am using numpy 1.14 also, but installed within an anaconda environment from their repository. Also, if it helps, I was getting very large negative numbers (~2^31) as the output for sum.