xarray: to_xarray() result is incorrect when one of multi-index levels is not sorted
What happened: to_xarray() sorts multi-index level values and returns result for the sorted values but it doesn’t sort levels or expects levels to be sorted resulting in completely incorrect order of data for the displayed coordinates.
df:
C1 C2
lev1 lev2
b foo 0 1
a foo 2 3
df.to_xarray():
<xarray.Dataset>
Dimensions: (lev1: 2, lev2: 1)
Coordinates:
* lev1 (lev1) object 'b' 'a'
* lev2 (lev2) object 'foo'
Data variables:
C1 (lev1, lev2) int64 2 0
C2 (lev1, lev2) int64 3 1
What you expected to happen: Should account for the order of levels in the original index.
Minimal Complete Verifiable Example:
import pandas as pd
df = pd.concat(
{
'b': pd.DataFrame([[0, 1]], index=['foo'], columns=['C1', 'C2']),
'a': pd.DataFrame([[2, 3]], index=['foo'], columns=['C1', 'C2']),
}
).rename_axis(['lev1', 'lev2'])
print('df:\n', df, '\n')
print('df.to_xarray():\n', df.to_xarray(), '\n')
print('df.index.levels[0]:\n', df.index.levels[0])
Anything else we need to know?:
Environment:
Output of <tt>xr.show_versions()</tt>
INSTALLED VERSIONS
commit: None python: 3.8.2 | packaged by conda-forge | (default, Apr 24 2020, 08:20:52) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 5.4.7-100.fc30.x86_64 machine: x86_64 processor: x86_64 byteorder: little LC_ALL: None LANG: en_GB.UTF-8 LOCALE: en_GB.UTF-8 libhdf5: 1.10.4 libnetcdf: 4.6.3 xarray: 0.15.1 pandas: 1.0.5 numpy: 1.19.0 scipy: 1.5.0 netCDF4: 1.5.3 pydap: None h5netcdf: None h5py: None Nio: None zarr: None cftime: 1.1.3 nc_time_axis: None PseudoNetCDF: None rasterio: None cfgrib: None iris: None bottleneck: 1.3.2 dask: 2.19.0 distributed: 2.19.0 matplotlib: 3.2.2 cartopy: None seaborn: 0.10.1 numbagg: installed setuptools: 46.3.0.post20200513 pip: 20.1 conda: None pytest: 5.4.3 IPython: 7.15.0 sphinx: None
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (10 by maintainers)
The sorting seems to be a separate matter, caused by
dataframe.set_index()
inside ourremove_unused_levels_categories
function. I think we can remove that, which will fix the sorting issue when removing unused levels. Then the result will be the desired: