astropy: Iterating over a SkyCoord object is very slow
I was surprised to see how long it takes to iterate over a multi-valued SkyCoord
object.
For example, it takes 19s to iterate over a SkyCoord object that contains 1000 coordinates:
In [1]: import numpy as np
In [2]: from astropy import units as u
In [3]: from astropy.coordinates import SkyCoord
In [4]: ra = np.random.uniform(0, 360, size=1000) * u.deg
In [5]: dec = np.random.uniform(-90, 90, size=1000) * u.deg
In [6]: crd = SkyCoord(ra, dec)
In [7]: timeit crd[500]
100 loops, best of 3: 19 ms per loop
In [8]: timeit [c for c in crd]
1 loops, best of 3: 19.4 s per loop
In contrast, iterating over a list of 1000 SkyCoord objects is roughly 500,000x faster:
In [9]: crd2 = [SkyCoord(ra[i], dec[i]) for i in range(len(ra))]
In [10]: %timeit crd2[500]
10000000 loops, best of 3: 38.7 ns per loop
In [11]: %timeit [c for c in crd2]
10000 loops, best of 3: 28.6 µs per loop
Might there be room to increase the performance of accessing a SkyCoord
item? Iterating over a list of coordinates seems like a common task to me.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 56 (53 by maintainers)
Exp1:
Added two lines to my code right after import lines. Res:
Alt-Az Calculated in 29.960163593292236 seconds
Exp2: Disconnected the computer from network. Res:
Alt-Az Calculated in 0.44874143600463867 seconds
Exp3:
Did the update. New astropy version is
'4.0rc1'
. Res(Connected to network):Alt-Az Calculated in 0.7916419506072998 seconds
The update seems to solve the problem. Thank you @mhvk and @bsipocz for help.
I think the refactoring in the caching systems has solved this issue, these are the numbers I get on master when copy pasting the snippet from above, as-is.
@mshemuni - Could you try to update for the new release candidate and report back whether it solved the problem? It can be pip installed:
python -m pip install --pre -U astropy
@mshemuni - that is really long! I just checked locally and while not as extreme as what you get, the last step does take 2 seconds. However, just repeating the exact same script in the same session reduces it to
0.03
seconds, which suggests it is some kind of initialization issue.Trying a bit more, I think I can localize this to reading the IERS tables (which are needed for getting accurate alt,az); if after you’re import of
Time
, you addtm.now().ut1
(which also forces the tables to be loaded), you’ll see that your other statements are decently fast.So, I think this is not an issue for
SkyCoord
itself, though I am a surprised how long the loading takes… Looking yet a bit further, it seems clear it is because the web site for the IERS table gets contacted even if you already downloaded the file, which seems incorrect behaviour. I think this is probably part of #9555This is a really sad difference. I started looking a bit along the chain, at at the lowest level, one can speed up getting an item from a representation by about a factor 3; see #5598.
I wanted to second the request for some more efficient way of accessing elements in a SkyCoord vector. Being able to iterate over a list of SkyCoord objects would be much more natural than pulling values off an internal vector. The fact that iterating over a SkyCoord object is so inefficient is also probably not going to be obvious to non-experts. Here are some benchmarks I made with astropy v1.2.1. You can see that there is a >1000x speed difference between the slowest and fastest methods for looping over ra and dec values in a SkyCoord vector.
Accessing SkyCoord by item:
Accessing from internal arrays by item:
Converting to numpy arrays: