astropy: Iterating over a SkyCoord object is very slow

I was surprised to see how long it takes to iterate over a multi-valued SkyCoord object.

For example, it takes 19s to iterate over a SkyCoord object that contains 1000 coordinates:

In [1]: import numpy as np
In [2]: from astropy import units as u
In [3]: from astropy.coordinates import SkyCoord
In [4]: ra = np.random.uniform(0, 360, size=1000) * u.deg
In [5]: dec = np.random.uniform(-90, 90, size=1000) * u.deg
In [6]: crd = SkyCoord(ra, dec)
In [7]: timeit crd[500]
100 loops, best of 3: 19 ms per loop
In [8]: timeit [c for c in crd]
1 loops, best of 3: 19.4 s per loop

In contrast, iterating over a list of 1000 SkyCoord objects is roughly 500,000x faster:

In [9]: crd2 = [SkyCoord(ra[i], dec[i]) for i in range(len(ra))]
In [10]: %timeit crd2[500]
10000000 loops, best of 3: 38.7 ns per loop
In [11]: %timeit [c for c in crd2]
10000 loops, best of 3: 28.6 µs per loop

Might there be room to increase the performance of accessing a SkyCoord item? Iterating over a list of coordinates seems like a common task to me.

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 56 (53 by maintainers)

Most upvoted comments

Exp1:

p.s. To be a bit more specific, adding the following before anything you do makes everything fast:

from astropy.utils import iers
iers.IERS_Auto.open()

Added two lines to my code right after import lines. Res: Alt-Az Calculated in 29.960163593292236 seconds

Exp2: Disconnected the computer from network. Res: Alt-Az Calculated in 0.44874143600463867 seconds

Exp3:

I think the refactoring in the caching systems has solved this issue, these are the numbers I get on master when copy pasting the snippet from above, as-is.

Time Object created in 0.0007829666137695312 seconds
Angle Object created in 0.024201154708862305 seconds
Angle Object created in 0.00028586387634277344 seconds
Site Object created in 0.0006892681121826172 seconds
Sky Object (Moon) created in 0.04855179786682129 seconds
Frame Object created in 0.0006310939788818359 seconds
Alt-Az Calculated in 0.869938850402832 seconds

@mshemuni - Could you try to update for the new release candidate and report back whether it solved the problem? It can be pip installed: python -m pip install --pre -U astropy

Did the update. New astropy version is '4.0rc1'. Res(Connected to network): Alt-Az Calculated in 0.7916419506072998 seconds

The update seems to solve the problem. Thank you @mhvk and @bsipocz for help.

I think the refactoring in the caching systems has solved this issue, these are the numbers I get on master when copy pasting the snippet from above, as-is.

Time Object created in 0.0007829666137695312 seconds
Angle Object created in 0.024201154708862305 seconds
Angle Object created in 0.00028586387634277344 seconds
Site Object created in 0.0006892681121826172 seconds
Sky Object (Moon) created in 0.04855179786682129 seconds
Frame Object created in 0.0006310939788818359 seconds
Alt-Az Calculated in 0.869938850402832 seconds

@mshemuni - Could you try to update for the new release candidate and report back whether it solved the problem? It can be pip installed: python -m pip install --pre -U astropy

@mshemuni - that is really long! I just checked locally and while not as extreme as what you get, the last step does take 2 seconds. However, just repeating the exact same script in the same session reduces it to 0.03 seconds, which suggests it is some kind of initialization issue.

Trying a bit more, I think I can localize this to reading the IERS tables (which are needed for getting accurate alt,az); if after you’re import of Time, you add tm.now().ut1 (which also forces the tables to be loaded), you’ll see that your other statements are decently fast.

So, I think this is not an issue for SkyCoord itself, though I am a surprised how long the loading takes… Looking yet a bit further, it seems clear it is because the web site for the IERS table gets contacted even if you already downloaded the file, which seems incorrect behaviour. I think this is probably part of #9555

This is a really sad difference. I started looking a bit along the chain, at at the lowest level, one can speed up getting an item from a representation by about a factor 3; see #5598.

I wanted to second the request for some more efficient way of accessing elements in a SkyCoord vector. Being able to iterate over a list of SkyCoord objects would be much more natural than pulling values off an internal vector. The fact that iterating over a SkyCoord object is so inefficient is also probably not going to be obvious to non-experts. Here are some benchmarks I made with astropy v1.2.1. You can see that there is a >1000x speed difference between the slowest and fastest methods for looping over ra and dec values in a SkyCoord vector.

Accessing SkyCoord by item:

c = SkyCoord(np.random.uniform(0, 360, size=200), np.random.uniform(-90, 90, size=200), unit='deg')
%timeit for i in range(len(c)): x, y = c[i].ra.deg, c[i].dec.deg
1 loop, best of 3: 9.76 s per loop

Accessing from internal arrays by item:

%timeit for i in range(len(c)): x, y = c.ra[i].deg, c.dec[i].deg
1 loop, best of 3: 445 ms per loop

Converting to numpy arrays:

%timeit for ra, dec in zip(c.ra.deg, c.dec.deg): x, y = ra, dec
100 loops, best of 3: 2.23 ms per loop