astropy: SkyCoord is very slow with pandas dataframe

Description

Defining a SkyCoord object using a pandas dataframe is hundreds of times slower than simply using a numpy array

Expected behavior

Same performance

Steps to Reproduce


import numpy as np
import astropy.units as u
from astropy.coordinates import SkyCoord
import pandas as pd
import time as t

N = 100000
ra = np.random.uniform(1, 200, N)
de = np.random.uniform(-70, 70, N)

df = pd.DataFrame()
df['ra'], df['dec'] = ra, de

# Very slow
s = t.time()
gc = SkyCoord(ra=df['ra'] * u.degree, dec=df['dec'] * u.degree)
print(t.time() - s)

# Not slow
s = t.time()
gc = SkyCoord(ra=df['ra'].values * u.degree, dec=df['dec'].values * u.degree)
print(t.time() - s)

System Details

Linux-5.5.0-050500-generic-x86_64-with-glibc2.17 Python 3.8.12 (default, Oct 12 2021, 13:49:34) [GCC 7.5.0] Numpy 1.21.2 pyerfa 2.0.0 astropy 5.0 Scipy 1.7.3 Matplotlib 3.5.0

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (15 by maintainers)

Most upvoted comments

I removed the Close? and wont-fix labels. This is something that is worth addressing albeit at moderate priority.

The pandas example above now returns an error and the user won’t really be able to trip up on this slow code path.

UnitTypeError: Angle instances require units equivalent to 'rad', but no unit was given.

Let’s close this issue?

Tested with: pandas 2.1.4 astropy 6.1dev

The underlying issue is described in more detail in #11247, and it has nothing to do with SkyCoord.

A second solution would be to use << instead of * because pandas does not think it can do that so astropy gets the chance of creating a Quantity.

Basically what @pllim said, which is that you cannot multiply a pandas Series by an astropy unit. Pandas does not support that operation and the results are not predictable. Your workaround of df['ra'].values * u.degree is the correct answer in this case.

In [54]: s = Series([1., 2.]) * u.degree

# Pandas internal representation of the data array is a `Quantity`. That is not expected!
In [55]: s.values
Out[55]: <Quantity [1., 2.] deg>

# Simple things like printing the Series breaks. Pandas should never be calling back
# into astropy.
In [56]: print(s)
---------------------------------------------------------------------------
UnitConversionError                       Traceback (most recent call last)
<ipython-input-56-0ff1b7208845> in <module>
----> 1 print(s)

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/core/series.py in __repr__(self)
   1463         show_dimensions = get_option("display.show_dimensions")
   1464 
-> 1465         self.to_string(
   1466             buf=buf,
   1467             name=self.name,

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/core/series.py in to_string(self, buf, na_rep, float_format, header, index, length, dtype, name, max_rows, min_rows)
   1532             max_rows=max_rows,
   1533         )
-> 1534         result = formatter.to_string()
   1535 
   1536         # catch contract violations

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in to_string(self)
    389 
    390         fmt_index, have_header = self._get_formatted_index()
--> 391         fmt_values = self._get_formatted_values()
    392 
    393         if self.is_truncated_vertically:

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in _get_formatted_values(self)
    373 
    374     def _get_formatted_values(self) -> list[str]:
--> 375         return format_array(
    376             self.tr_series._values,
    377             None,

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
   1238     )
   1239 
-> 1240     return fmt_obj.get_result()
   1241 
   1242 

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result(self)
   1269 
   1270     def get_result(self) -> list[str]:
-> 1271         fmt_values = self._format_strings()
   1272         return _make_fixed_width(fmt_values, self.justify)
   1273 

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in _format_strings(self)
   1516 
   1517     def _format_strings(self) -> list[str]:
-> 1518         return list(self.get_result_as_array())
   1519 
   1520 

~/miniconda3/envs/ska3/lib/python3.8/site-packages/pandas/io/formats/format.py in get_result_as_array(self)
   1500             # large values: more that 8 characters including decimal symbol
   1501             # and first digit, hence > 1e6
-> 1502             has_large_values = (abs_vals > 1e6).any()
   1503             has_small_values = (
   1504                 (abs_vals < 10 ** (-self.digits)) & (abs_vals > 0)

~/miniconda3/envs/ska3/lib/python3.8/site-packages/astropy/units/quantity.py in __array_ufunc__(self, function, method, *inputs, **kwargs)
    592         # consistent units between two inputs (e.g., in np.add) --
    593         # and the unit of the result (or tuple of units for nout > 1).
--> 594         converters, unit = converters_and_unit(function, method, *inputs)
    595 
    596         out = kwargs.get('out', None)

~/miniconda3/envs/ska3/lib/python3.8/site-packages/astropy/units/quantity_helper/converters.py in converters_and_unit(function, method, *args)
    190                         converters[i] = None
    191                     else:
--> 192                         raise UnitConversionError(
    193                             "Can only apply '{}' function to "
    194                             "dimensionless quantities when other "

UnitConversionError: Can only apply 'greater' function to dimensionless quantities when other argument is not a quantity (unless the latter is all zero/infinity/nan)