astropy: SkyCoord is very slow with pandas dataframe
Description
Defining a SkyCoord
object using a pandas
dataframe is hundreds of times slower than simply using a numpy
array
Expected behavior
Same performance
Steps to Reproduce
import numpy as np
import astropy.units as u
from astropy.coordinates import SkyCoord
import pandas as pd
import time as t
N = 100000
ra = np.random.uniform(1, 200, N)
de = np.random.uniform(-70, 70, N)
df = pd.DataFrame()
df['ra'], df['dec'] = ra, de
# Very slow
s = t.time()
gc = SkyCoord(ra=df['ra'] * u.degree, dec=df['dec'] * u.degree)
print(t.time() - s)
# Not slow
s = t.time()
gc = SkyCoord(ra=df['ra'].values * u.degree, dec=df['dec'].values * u.degree)
print(t.time() - s)
System Details
Linux-5.5.0-050500-generic-x86_64-with-glibc2.17 Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] Numpy 1.21.2 pyerfa 2.0.0 astropy 5.0 Scipy 1.7.3 Matplotlib 3.5.0
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (15 by maintainers)
I removed the Close? and wont-fix labels. This is something that is worth addressing albeit at moderate priority.
The pandas example above now returns an error and the user won’t really be able to trip up on this slow code path.
Let’s close this issue?
Tested with: pandas 2.1.4 astropy 6.1dev
The underlying issue is described in more detail in #11247, and it has nothing to do with
SkyCoord
.A second solution would be to use
<<
instead of*
becausepandas
does not think it can do that soastropy
gets the chance of creating aQuantity
.Basically what @pllim said, which is that you cannot multiply a pandas
Series
by an astropy unit. Pandas does not support that operation and the results are not predictable. Your workaround ofdf['ra'].values * u.degree
is the correct answer in this case.