geopandas: BUG: explode() raises ValueError

Hi,

as reported here https://github.com/martinfleis/momepy/issues/123, at certain situation gdf.explode() raises ValueError: Shape of passed values is (132850, 183), indices imply (132842, 183). Using data retrieved from OSM using OSMnx. (Warning - Vancouver gdf is large)

import geopandas as gpd
import osmnx as ox

gdf = ox.footprints.footprints_from_place(place='Vancouver, Canada')
gdf_projected = ox.project_gdf(gdf)
exploded = gdf_projected.explode()

I tried to save a small set to geojson, but after loading back to geopandas it does not cause the error 🤔

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@seizethedata This bug is super strange with your data. I wasn’t able to figure out what happens there nor find a workaround with current version of geopandas. But I was able to patch explode to work with your data - #1319.

@seizethedata Try reset_index() before exploding.

But if the order of index values is the opposite, it raises ValueError:

Hmm, that seems a bug. Can you report that to pandas?

Smaller reproducer:

Most of the entries don’t get exploded, only a few are actual MultiPolygons with multiple parts. Taking the first + one that gets exploded (found from gdf.geometry.explode(), on the GeoSeries it works), still gives the error:

In [26]: subset = gdf.loc[[23253981, 4761998], :] 

In [27]: subset  
Out[27]: 
                                                      nodes                                           geometry     building addr:housenumber  ... check_date bridge opening_date          type
23253981  [251629948, 3607852090, 3607852091, 251629949,...  POLYGON ((-123.0727049 49.2147746, -123.073652...       school              NaN  ...        NaN    NaN          NaN           NaN
4761998                                                 NaN  (POLYGON ((-123.1615685 49.2642942, -123.16157...  residential             2475  ...        NaN    NaN          NaN  multipolygon

[2 rows x 182 columns]

In [28]: subset.explode() 
...
ValueError: Shape of passed values is (3, 183), indices imply (2, 183)

Further taking some columns as well:

In [30]: subset = subset[subset.columns[:5]].copy() 

Now, what I noticed when debugging this, is that it is the 2D object block that doesn’t get reshaped correctly:

In [32]: subset.explode()   
...
ValueError: Shape of passed values is (3, 6), indices imply (2, 6)

In [33]: %debug
> /home/joris/miniconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py(1718)construction_error()
   1716         raise ValueError("Empty data passed with indices specified.")
   1717     raise ValueError(
-> 1718         "Shape of passed values is {0}, indices imply {1}".format(passed, implied)
   1719     )
   1720 

ipdb> u  
> /home/joris/miniconda3/lib/python3.7/site-packages/pandas/core/internals/managers.py(345)_verify_integrity()
    343         for block in self.blocks:
    344             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
--> 345                 construction_error(tot_items, block.shape[1:], self.axes)
    346         if len(self.items) != tot_items:
    347             raise AssertionError(

ipdb> p self.blocks
(ObjectBlock: slice(0, 4, 1), 4 x 2, dtype: object, IntBlock: slice(4, 5, 1), 1 x 3, dtype: int64, ObjectBlock: slice(5, 6, 1), 1 x 3, dtype: object)
#                                 |-> 2 rows                                      |-> 3 rows                                        |-> 3 rows  

And the original dataframe is also all object dtype (the geometry column as well, but that’s just because I am debugging on geopandas 0.5 where I had osmnx installed):

In [34]: subset.dtypes                                                                                                                                                                                             
Out[34]: 
nodes               object
geometry            object
building            object
addr:housenumber    object
addr:street         object
dtype: object

So let’s see if changing some to non-object dtype solves something, however, that doesn’t fix it:

In [38]:  subset['addr:housenumber'] = subset['addr:housenumber'].astype(float) 

In [39]: subset[['addr:housenumber', 'geometry']].explode() 
...
ValueError: Shape of passed values is (3, 3), indices imply (2, 3)

Another specific thing about this dataset is that it has high integer indices (not default 0,1,2, n):

In [42]: subset[['addr:housenumber', 'geometry']].reset_index(drop=True).explode()
Out[42]: 
     addr:housenumber                                           geometry
0 0               NaN  POLYGON ((-123.0727049 49.2147746, -123.073652...
1 0            2475.0  POLYGON ((-123.1615685 49.2642942, -123.161570...
  1            2475.0  POLYGON ((-123.1622072 49.2643049, -123.162209...

That seems to fix it! And it also does fix it on the original data:

In [45]: gdf.reset_index(drop=True).explode()
Out[45]: 
                                                      nodes       building addr:housenumber        addr:street  ... bridge opening_date          type                                           geometry
0      0  [251629948, 3607852090, 3607852091, 251629949,...         school              NaN                NaN  ...    NaN          NaN           NaN  POLYGON ((-123.0727049 49.2147746, -123.073652...
1      0  [268527777, 472917394, 268527778, 3099866715, ...        stadium              777  Pacific Boulevard  ...    NaN          NaN           NaN  POLYGON ((-123.1135167 49.2763119, -123.113285...
2      0  [1845869695, 1845869693, 268527967, 3714369280...        stadium              800      Griffiths Way  ...    NaN          NaN           NaN  POLYGON ((-123.109011 49.278442, -123.1088138 ...
3      0  [366639854, 1578563638, 1578563641, 1578563640...  train_station             1150     Station Street  ...    NaN          NaN           NaN  POLYGON ((-123.0981085 49.2741719, -123.098080...
4      0  [370490167, 5577882816, 5577882808, 5577882809...            yes             1661      Parker Street  ...    NaN          NaN           NaN  POLYGON ((-123.0709845 49.276187, -123.0710625...
...                                                     ...            ...              ...                ...  ...    ...          ...           ...                                                ...
132837 0                                                NaN     commercial              312        Main Street  ...    NaN          NaN  multipolygon  POLYGON ((-123.0994331 49.2817602, -123.099421...
132838 0                                                NaN            yes              NaN                NaN  ...    NaN          NaN  multipolygon  POLYGON ((-123.1289575 49.227361, -123.1287171...
132839 0                                                NaN            yes              NaN                NaN  ...    NaN          NaN  multipolygon  POLYGON ((-123.096785 49.2618756, -123.0967933...
132840 0                                                NaN            yes              966   West 14th Avenue  ...    NaN          NaN  multipolygon  POLYGON ((-123.1258587 49.2585495, -123.125811...
132841 0                                                NaN     apartments             3736  Commercial Street  ...    NaN          NaN  multipolygon  POLYGON ((-123.0679012 49.2515969, -123.067492...

[132850 rows x 182 columns]

So at least, that gives the original reporter a workaround. And hopefully those pointers can also help us find the cause 😉