bambi: `plot_predictions` breaks with HSGP
The attached dataset will help reproduce the issue, but basically:
formula = "p(Campus_Living, N) ~ 1 + (1 | Age_Group) + (1 | Major) + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True) + hsgp(Fall_Year, by=Major, m=10, c=2, scale=True, centered=True)"
m_bambi = bmb.Model(
formula=formula,
data=data,
family="binomial",
categorical=["Age_Group", "Major"],
)
m_bambi.build()
idata_bambi = m_bambi.fit()
## this line fails
bmb.interpret.plot_predictions(
m_bambi,
idata_bambi,
conditional={"Fall_Year": np.linspace(2012, 2025)},
);
yields (full traceback at end of post):
IndexError: index -1 is out of bounds for axis 1 with size 0
If you do formula = "p(Campus_Living, N) ~ 1 + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True)", the error becomesKeyError: 'Age_Group'.
Interestingly, if you change formula to "p(Campus_Living, N) ~ 1 + (1 | Age_Group) + (1 | Major) + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True)" or "p(Campus_Living, N) ~ 1 + (1 | Age_Group) + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True)", the plot will display (although I’m not sure it’s right)!
It looks like some indexing is happening in by_values in bambi/model_components.py:262, which yields the error.
Thanks a lot for your help, and LMK if anything is unclear 🙏
Full error:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[225], line 1
----> 1 bmb.interpret.plot_predictions(
2 m_bambi,
3 idata_bambi,
4 conditional={"Fall_Year": np.linspace(2012, 2025)},
5 );
File ~/mambaforge/envs/bayes-workshop/lib/python3.11/site-packages/bambi/interpret/plotting.py:181, in plot_predictions(model, idata, conditional, average_by, target, sample_new_groups, pps, use_hdi, prob, transforms, legend, ax, fig_kwargs, subplot_kwargs)
174 if average_by is True:
175 raise ValueError(
176 "Plotting when 'average_by = True' is not possible as 'True' marginalizes "
177 "over all covariates resulting in a single prediction estimate. "
178 "Please pass a covariate(s) to 'average_by'."
179 )
--> 181 cap_data = predictions(
182 model=model,
183 idata=idata,
184 conditional=conditional,
185 average_by=average_by,
186 target=target,
187 pps=pps,
188 use_hdi=use_hdi,
189 prob=prob,
190 transforms=transforms,
191 sample_new_groups=sample_new_groups,
192 )
194 conditional_info = ConditionalInfo(model, conditional)
195 transforms = transforms if transforms is not None else {}
File ~/mambaforge/envs/bayes-workshop/lib/python3.11/site-packages/bambi/interpret/effects.py:532, in predictions(model, idata, conditional, average_by, target, pps, use_hdi, prob, transforms, sample_new_groups)
530 y_hat_mean = y_hat.mean(("chain", "draw"))
531 else:
--> 532 idata = model.predict(
533 idata, data=cap_data, sample_new_groups=sample_new_groups, inplace=False
534 )
535 y_hat = response_transform(idata["posterior"][response.name_target])
536 y_hat_mean = y_hat.mean(("chain", "draw"))
File ~/mambaforge/envs/bayes-workshop/lib/python3.11/site-packages/bambi/models.py:832, in Model.predict(self, idata, kind, data, inplace, include_group_specific, sample_new_groups)
829 else:
830 var_name = f"{response_aliased_name}_{name}"
--> 832 means_dict[var_name] = component.predict(
833 idata, data, include_group_specific, hsgp_dict, sample_new_groups
834 )
836 # Drop var/dim if already present. Needed for out-of-sample predictions.
837 if var_name in idata.posterior.data_vars:
File ~/mambaforge/envs/bayes-workshop/lib/python3.11/site-packages/bambi/model_components.py:185, in DistributionalComponent.predict(self, idata, data, include_group_specific, hsgp_dict, sample_new_groups)
182 linear_predictor_dims = linear_predictor_dims + (response_levels_dim,)
184 if self.design.common:
--> 185 linear_predictor += self.predict_common(
186 posterior, data, in_sample, to_stack_dims, design_matrix_dims, hsgp_dict
187 )
189 if self.design.group and include_group_specific:
190 linear_predictor += self.predict_group_specific(
191 posterior, data, in_sample, to_stack_dims, design_matrix_dims, sample_new_groups
192 )
File ~/mambaforge/envs/bayes-workshop/lib/python3.11/site-packages/bambi/model_components.py:262, in DistributionalComponent.predict_common(self, posterior, data, in_sample, to_stack_dims, design_matrix_dims, hsgp_dict)
256 # NOTE:
257 # The approach here differs from the one in the PyMC implementation.
258 # Here we have a single dot product with many zeros, while there we have many
259 # smaller dot products.
260 # It is subject to change here, but I don't want to mess up dims and coords.
261 if term.by_levels is not None:
--> 262 by_values = x_slice[:, -1].astype(int)
263 x_slice = x_slice[:, :-1]
264 x_slice_centered = (x_slice.data - term.mean[by_values]) / maximum_distance
IndexError: index -1 is out of bounds for axis 1 with size 0
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Reactions: 1
- Comments: 16
@AlexAndorra I think it should be fixed now.
For that:
I haven’t tested it on my end, but I think the changes I implemented were the required ones 😃
@GStechschulte
The
LazyCallinstance has several attributes, one isargs, which is what we are already using, and the other iskwargs, which we are not using. See for exampleAnd to access the name of the
byvariable we can docomponent.call.kwargs["by"].name.So, we need to check
kwargsin addition toargs. However, I’m realizing this won’t be that simple (for now) because:LazyVariables. That’s not hard, it’s simplykwarg.name. However…LazyVariablefor things that are not variable names, but values, such asTrue. This is not a problem for Formulae because that “name” is in the namespace where formulas are evaluated. But…kwarg.nameon theLazyVariableof, for example,centered, the program will think there’s a variable called"True", and that’s not the intended behavior.I think we can do the following:
LazyValues for things likeFalse,True, andNone(I need to check if there are more)kwargswhen these are instances ofLazyVariable.I’ll work on these things now
Yeah I can confirm. If you do:
Now you get:
Hope that helps!
Aaaaah, that makes sense, thanks a lot @GStechschulte ! Is that gonna work if I have more GPs though? Let’s say I have 3 different GPs, on 3 different categorical variables. So now, in
conditionalI’m gonna have 4 keys (Fall_Yearand the 3 covariates), so I have to specifyaverage_by, which I think is gonna have the same issue. Correct?Thanks a lot @AlexAndorra.
formulaeis raising the correct error.The error lies in
interpretand is happening because when creating the data forplot_predictions, we first return all covariates specified in the model. However, we have a bug when an arg. is passed tobyof the HSGP term. OnlyFall_Yearis being returned. BothFall_YearandAge_Groupshould be returned. This isn’t a problem with your first model formula becauseAge_Groupis a group-specific term.As a workaround until I fix the bug I suggest the following.
We are able to pass
Age_Grouptoconditionalbecause we know a priori that it should exist in the data grid. However, if you attempt to pass a variable toconditionalthat was not specified in the formula, then you will get aKeyError.This works great for
"p(Campus_Living, N) ~ 1 + (1 | Age_Group) + (1 | Major) + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True) + hsgp(Fall_Year, by=Major, m=10, c=2, scale=True, centered=True)"@tomicapretto 🍾However, the error is still here when removing the group-specific effects:
"p(Campus_Living, N) ~ 1 + hsgp(Fall_Year, by=Age_Group, m=10, c=2, scale=True, centered=True)", The posterior predictions sample now, but theplot_predictionsstill fails:KeyError: 'Age_Group'. This is triggered by:So it still looks like a problem when updating the design matrix. LMK if that’s unclear 🙏
Full traceback:
Makes sense indeed.
Will do ASAP and of course keep you posted. Thanks for the lightning fast fix @tomicapretto 🙏
I’ve found where the bug is!
https://github.com/bambinos/bambi/blob/b5b9f093c623636ae3a8e2f0765b2f01ca26f4b0/bambi/model_components.py#L242-L246
In line 246 we’re deleting columns of the design matrix based on the slice
term_slice. This only happens for HSGP. The problem is that the slice for the second (and third, fourth, etc.) HSGP term does not reflect the shape of X at the time it’s going to be used to slice X.I think the fix is to delete the HSGP term data from the design matrix X only after having iterated over all the HSGP terms. I’ll test it soon.
Thanks for raising the issue and the detailed information!