geopandas: [cython] specific case where new sjoin is much slower
@andreas-h reported a use case where the sjoin
from the geopandas-cython branch is much slower than the current released version: https://gist.github.com/andreas-h/4906aea5d8ecffc9751e191cd11d00b4
I ran it locally and I can confirm this. It is joining 20,000 points with 44,000 polygons (this only takes ca 5s on master, but 30-60s on the cython branch).
I tried to profile it, but it seems to indicate that virtually all time is spent within the cython cysjoin function (and thus c sjoin fucntion). Which is also strange because also the actual pandas code in the user-facing sjoin function should take some time. I did not yet check that the actual results of both versions are the same; possibly one of both implementations is doing something wrong.
cc @mrocklin
@andreas-h could you simplify the example a little bit? (to not depend on the emiprepr
library, eg just construct the polygons directly inside the notebook)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 18 (17 by maintainers)
I re-ran these tests (gist), I’m posting here as well as in #1344 to try and give some closure to this issue.
Namely, I added PyGEOS which also uses GEOS’ STRTree but different Python binding and geometry data structures:

So it seems to me that most of the slowdown comes from Shapely/Python stuff, not GEOS.