astropy: SkyCoord is very slow with pandas dataframe
Description
Defining a SkyCoord object using a pandas dataframe is hundreds of times slower than simply using a numpy array
Expected behavior
Same performance
Steps to Reproduce
import numpy as np
import astropy.units as u
from astropy.coordinates import SkyCoord
import pandas as pd
import time as t
N = 100000
ra = np.random.uniform(1, 200, N)
de = np.random.uniform(-70, 70, N)
df = pd.DataFrame()
df['ra'], df['dec'] = ra, de
# Very slow
s = t.time()
gc = SkyCoord(ra=df['ra'] * u.degree, dec=df['dec'] * u.degree)
print(t.time() - s)
# Not slow
s = t.time()
gc = SkyCoord(ra=df['ra'].values * u.degree, dec=df['dec'].values * u.degree)
print(t.time() - s)
System Details
Linux-5.5.0-050500-generic-x86_64-with-glibc2.17 Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] Numpy 1.21.2 pyerfa 2.0.0 astropy 5.0 Scipy 1.7.3 Matplotlib 3.5.0
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (15 by maintainers)
I removed the Close? and wont-fix labels. This is something that is worth addressing albeit at moderate priority.
The pandas example above now returns an error and the user won’t really be able to trip up on this slow code path.
Let’s close this issue?
Tested with: pandas 2.1.4 astropy 6.1dev
The underlying issue is described in more detail in #11247, and it has nothing to do with
SkyCoord.A second solution would be to use
<<instead of*becausepandasdoes not think it can do that soastropygets the chance of creating aQuantity.Basically what @pllim said, which is that you cannot multiply a pandas
Seriesby an astropy unit. Pandas does not support that operation and the results are not predictable. Your workaround ofdf['ra'].values * u.degreeis the correct answer in this case.