xgcm: Add .sel and .isel equivalent that returns all vars on all grids related to the indexers?

I’m working on a project that aims at providing observation-like sampling for Ocean model data living on a C grid. What we aim at is keeping the C grid logic intact as long as possible. For this, something like

ds.xgcm.isel(X=3, Y=5)

returning a full dataset restricted to x_c=3, x_l=3, x_r=3, y_c=5, y_l=5, y_r=5and a correspondingds.xgcm.sel(…)` would be ideal.

A related question is: Which grid points are “related” to a given position?

If we consider the tracer grid of a finite volume model the basis of our selection, then for a complete description of the dynamics of the selection, we’d not only need the vars defined on the faces for one side (per dim), but for both sides of the box.

(Thanks @jbusecke for the chat yesterday. Definitely helped me getting a clearer idea on what I’m looking for.)

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 22 (20 by maintainers)

Most upvoted comments

Hey @jdldeauna, I believe that this is the expected behavior when passing a slice object, and I would be hesitant to change this. I think you could pass a list of indicies instead ...isel(xt=[1,2]) to achieve what you want here.

For the more general question So how can the xgcm standard positions ({"X": {"left": "outer"}}) be combined with the xarray isel and the slice functions when subsetting variables? We should consider all the possible inputs for .isel in xarray. From the xarray docs we get:

A dict with keys matching dimensions and values given by integers, slice objects or arrays. indexer can be a integer, slice, array-like or DataArray.

As a start, I would try to come up with some experimental logic to convert all of these inputs to the desired output. E.g. some function like this (pseudo-code):

def get_grid_indexer(grid, ds, axis_idx_dict, reference_position='center'):
    """ grid is just a placeholder for self later, ds is an input dataset, axis_idx_dict is something like you outlined above {'X': slice(1,2), 'Y':4}
    and reference_position is either a str or dict (per axis) that tells this function which dimension should be used for the initial selection (xt in your above example)"""
    for axis, indexer in axis_idx_dict.items():
         # check if axis in grid object
         # figure out all the dimensions and grid positions in `ds` that are associated with this axis
         dimensions, grid_positions = ....()
         # loop over all positions and apply appropriate selection
         for di, grid_position in zip(dimensions, grid_positions):
               if grid_position == reference position:
                     # simplest case, just apply the selection as is
                     ds = ds.isel({dim:indexer}
               else:
                     # Implement a 'translation' logic to convert `indexer` to other grid positions (needs to cover all possible combinations)
                     if reference_position == 'center' and grid_position == 'outer':
                          modified_indexer = ... # if you pass [1, 2] as the initial indexer you want to get something like [0,1,2] out here
                          # apply the selection
                          ds = ds.isel({dim: modified_indexer})

You will have to see how to make this work with all the inputs from above. Maybe xarrays internal logic has some things we could adapt. I am personally not very familiar with the indexing logic of xarray, but maybe @dcherian knows more and has some ideas how to make this easier (this is a pretty brute force approach but should work IMO).

As a sidenote, I think we might want to disallow indexing with Dataarrays for now, and keep that for later?

I think that is right. We just want to operate based on grid position. The variable can be whatever is located on that grid position, it should not matter for this feature.

I read it as saying: take any variable currently on x: left and make it x: outer on the returned grid and dataset.

I agree. let’s do isel first. sel can then call isel.

pos_mapping={‘center’:‘center’, ‘left’:‘left’}

I feel like this should be the default. i.e. preserve the positions. For your first case, shouldn’t this be {"X": {"left": "outer"}} i.e. we should have to specify this per-axis and positions that don’t change need not be specified.

There is an available test for using GCMs (test_gcm_dataset.py) but it’s marked as deprecated. Could it still be used, and is the GCMDataset still available? Alternatively, is it advisable to write tests which use intake to download GCM datasets for testing, or would that add complexity we wouldn’t want?

This might be worth raising another issue? just to keep the discussion here focussed?

Hey @jdldeauna,

  1. Yes I think this should live in the grid object as a method
  2. I think it would be even more powerful if we can implement this on a full dataset (see example below). Regarding the lat/lon selection: I am not sure how to even implement a general lon/lat selection without a mask. On a regular lon/lat grid this is super easy, and you can figure out the indicies before passing them to grid.isel() but once you have curvilinear coordinates there have to be ‘gaps’ in your resulting data.

I envision the API to be something like this ds_new, grid_new = grid.isel({'X':slice(a,b), 'Y':slice(c,d)}). I think the default should always apply the selection based on the center position and select all other positions to result in an outer:

Lets say we have a setup like this along a single axis X

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
       [0]            [1]            [2]            [3]

Velocity(left)
 |------o-------|------o-------|------o-------|------o-------|
[0]            [1]            [2]            [3]

I think grid.isel({'X':slice(1,2)}) should return:

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
                      [1]            [2]            

Velocity(outer)
 |------o-------|------o-------|------o-------|------o-------|
               [1]            [2]            [3]

This to me is the most intuitive default because it creates velocity cells ‘surrounding’ the tracer.

Of course this can be modified: grid.isel({'X':slice(1,2)}, pos_mapping={'center':'center', 'left':'left'})

Would results in:

Tracer(center)
 |------o-------|------o-------|------o-------|------o-------|
                      [1]            [2]            

Velocity(left)
 |------o-------|------o-------|------o-------|------o-------|
               [1]            [2]          

These choices are then automatically parsed into the new grid object.

I think this would be a huuuge first step to accomplish. The lon/lat selection could then maybe build on top of that, but this needs to work first IMO.

Yeah I do a lot of subsetting, so it would be great to integrate that with xgcm. 😃

If you have the axis attribute set, you can now do this with cf-xarray: https://cf-xarray.readthedocs.io/en/latest/examples/introduction.html#Slicing

EDIT: that’s not exactly right. It won’t take into account the grid layout. cf_xarray implements the grid-unaware version of this feature