pyproj: transform much slower in 2.0.1 than 1.9.6

I’m using python 3.6.7 on ubuntu 18.04.1. I installed each version of pyproj via pip install pyproj==x.y.z.

I’m using pyproj via geopandas. I recently upgraded from 1.9.5.1 to 2.0.1 and noticed that calls to to_crs which uses pyproj.transform via shapely.ops.transform got much slower.

It seems like there is a large overhead on the call to transform now. When calling transform with large arrays, the difference in less pronounced, but when called on individual coordinates it is quite large.

I tested with individual x,y pairs; arrays of 10 elements each to simulate usage with a “normal” sized geometry; and a test of 1,000,000 elements each which is unrealistic unless you’re working just with points and not more complex geometry.

This was the setup:

import pyproj
proj_in = pyproj.Proj({'init': 'epsg:2263'}, preserve_units=True)
proj_out = pyproj.Proj({'init': 'epsg:4326'}, preserve_units=True)

Testing on Individual coordinate pairs, 2.0.1 is ~1000x slower than 1.9.5.1:

%%t -n50 -r50
pyproj.transform(proj_in, proj_out, random.randint(80000, 120000), random.randint(200000, 250000))
  • 1.9.5.1 - 13.9 µs ± 5.82 µs per loop
  • 1.9.6 - 14.6 µs ± 6.52 µs per loop
  • 2.0.0 - 945 µs ± 152 µs per loop
  • 2.0.1 - 8.77 ms ± 915 µs per loop

For arrays of 10 coordinates each:

%%t -n50 -r50
pyproj.transform(proj_in, proj_out, np.random.randint(80000, 120000, 10), np.random.randint(200000, 250000, 10))
  • 1.9.6 - 25.1 µs ± 16.8 µs per loop
  • 2.0.1 - 8.81 ms ± 798 µs per loop

And for arrays of 1,000,000:

%%t -n5 -r5
pyproj.transform(proj_in, proj_out, np.random.randint(80000, 120000, 1000000), np.random.randint(200000, 250000, 1000000))
  • 1.9.6 - 689 ms ± 7.57 ms per loop
  • 2.0.1 - 1.18 s ± 24.9 ms per loop

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 11
  • Comments: 21 (11 by maintainers)

Most upvoted comments

Thoughts using a class based approach? This would mean that the TransProj could be a property on the class and would not need to be generated by the user.

from pyproj import Transformer, Proj

proj_in = Proj({'init': 'epsg:2263'}, preserve_units=True)
proj_out = Proj({'init': 'epsg:4326'}, preserve_units=True)
transformer = Transformer(proj_in, proj_out)
transformer.transform(random.randint(80000, 120000), random.randint(200000, 250000))

Plus, it would be fun to have a class called Transformer.

Just to be sure, you are following the recommendation here: https://pyproj4.github.io/pyproj/html/optimize_transformations.html?

New version times:

import numpy as np                                                                                                                            
from pyproj import Transformer                                                                                                                

transformer = Transformer.from_proj(2263, 4326) 

Test 1:

%%timeit -n50 -r50 
transformer.transform(np.random.randint(80000, 120000), np.random.randint(200000, 250000)) 
36.4 µs ± 7.27 µs per loop (mean ± std. dev. of 50 runs, 50 loops each)

Test 2:

%%timeit -n50 -r50 
 transformer.transform(np.random.randint(80000, 120000, 10), np.random.randint(200000, 250000, 10)) 
The slowest run took 4.05 times longer than the fastest. This could mean that an intermediate result is being cached.
63.5 µs ± 33.6 µs per loop (mean ± std. dev. of 50 runs, 50 loops each)
%%timeit 
transformer.transform(np.random.randint(80000, 120000, 10), np.random.randint(200000, 250000, 10)) 
30.8 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Test 3:

 %%timeit -n5 -r5  
transformer.transform(np.random.randint(80000, 120000, 1000000), np.random.randint(200000, 250000, 1000000)) 
1.94 s ± 21.9 ms per loop (mean ± std. dev. of 5 runs, 5 loops each)

But, this is on my machine. Curious to know how it performs on yours when 2.1.0 is released.