intake-esm: Can't combine variables when attributes are different

I am trying to load both DIC (dissic) and cell thickness (thkcello) from all of the models that have those and am running into an issue with the UK model where these variables can’t be merged because the attributes are different. The ‘realm’ is ‘ocean’ for thkcello but ‘ocnBgchem’ for dissic.

It seems like this could be solved by relaxing the requirements that the attributes are the same for each variable.

The error that I get is: MergeError: conflicting values for variable ‘vertices_longitude’ on objects to be combined. You can skip this check by specifying compat=‘override’.

The code I am running:

cat = col.search(experiment_id=['historical'], table_id='Oyr', variable_id=['dissic','thkcello'], grid_label='gn')
uni_dict = col.unique(['source_id', 'experiment_id', 'table_id'])

models = set(uni_dict['source_id']['values']) # all the models

for experiment_id in ['historical']:
    query = dict(experiment_id=experiment_id, table_id='Omon', 
                 variable_id=['dissic','thkcello'], grid_label='gn')  
    cat = col.search(**query)
    models = models.intersection({model for model in cat.df.source_id.unique().tolist()})

models = list(models)

cat = col.search(experiment_id=['historical'], table_id='Omon', 
                 variable_id=['dissic','thkcello'], grid_label='gn', source_id=models)

dset_dict = cat.to_dataset_dict(zarr_kwargs={'consolidated': True, 'decode_times': False}, 
                                cdf_kwargs={'chunks': {'time' : 20}, 'decode_times': False})

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 20 (8 by maintainers)

Most upvoted comments

Just in case other people encounter a similar error, cmip6_preprocessing has new features which address the issue:

import intake_esm
import intake
from cmip6_preprocessing.preprocessing import combined_preprocessing

cat_url = "https://storage.googleapis.com/cmip6/pangeo-cmip6-noQC.json"
col = intake.open_esm_datastore(cat_url)

kwargs = {
    'zarr_kwargs':{
        'consolidated':True,
        'use_cftime':True
    },
    'aggregate':False,
    'preprocess':combined_preprocessing
}

cat = col.search(experiment_id='historical',
                 variable_id=['uo','vo','so'], 
                 grid_label='gn', 
                 member_id='r1i1p1f1',
                 source_id='MPI-ESM1-2-HR'
                )
ddict = cat.to_dataset_dict(**kwargs)
list(ddict.keys())

from cmip6_preprocessing.postprocessing import merge_variables

ddict_merged = merge_variables(ddict)
list(ddict_merged.keys())

ddict_merged['MPI-ESM1-2-HR.gn.historical.Omon.r1i1p1f1']

from cmip6_preprocessing.postprocessing import concat_members

ddict_concat = concat_members(ddict_merged)
print(list(ddict_concat.keys()))
ddict_concat['MPI-ESM1-2-HR.gn.historical.Omon']

ds = ddict_concat['MPI-ESM1-2-HR.gn.historical.Omon']

ds.so