bambi: `Model.predict()` generates unexpected out-of-sample predictions for a mixed effects model
Hello,
first of all, thanks for the great work on this project, I’ve been using bambi a lot and it has been super helpful!
I’m currently facing a (potential) issue when trying to make out-of-sample predictions for a logistic regression model built with the following formula:
y ~ x1 + x2 + x3 + (0 + x2|x1) + (0 + x3|x1)
where x1 and x2 are categorical variables with two dimensions respectively and x3 is a continuous variable.
The out-of-sample data I’m trying to make predictions for looks like this (exemplary):
| x1 | x2 | x3 |
|---|---|---|
| 0 | 0 | 0 |
| 0 | 0 | 0.5 |
| 0 | 0 | 1 |
| 0 | 0 | 1.5 |
| 1 | 0 | 0 |
| 1 | 0 | 0.5 |
| 1 | 0 | 1 |
| 1 | 0 | 1.5 |
There was no error when running model.predict(iData, data=out_of_sample_data, kind='mean'), however the spaghetti plot I generated from the posterior predictions looked off for when x1==1, the variance was much bigger than I expected. (I noticed this because I had manually made a plot displaying the 0.5 decision boundary for x3, i.e. the mean value and hdi intervals of x3 where there is a 50% probability of a positive outcome and that didn’t match what I saw in the spaghetti plot.)
I then had a look at the code and noticed that the Z matrix generated in the predict method in models.py looked different from what I expected.
Here’s the code bit I’m referring to (last line):
if self._design.group:
if in_sample:
Z = self._design.group.design_matrix
else:
Z = self._design.group._evaluate_new_data(data).design_matrix
What I got for Z was the following:
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0.5 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 1.5 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0 |
| 0 | 0 | 1 | 0 | 0 | 0.5 |
| 0 | 0 | 1 | 0 | 0 | 1 |
| 0 | 0 | 1 | 0 | 0 | 1.5 |
…but what I was expecting (after trying to make sense of it) was this:
| 1 | 0 | 0 | 0 | 0 | 0 |
| 1 | 0 | 0 | 0 | 0.5 | 0 |
| 1 | 0 | 0 | 0 | 1 | 0 |
| 1 | 0 | 0 | 0 | 1.5 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0 |
| 0 | 1 | 0 | 0 | 0 | 0.5 |
| 0 | 1 | 0 | 0 | 0 | 1 |
| 0 | 1 | 0 | 0 | 0 | 1.5 |
so basically the second and third column swapped. I then added the following line:
Z[:, [1, 2]] = Z[:, [2, 1]]
to achieve that and the spaghetti plot I then generated matched my expectation.
It would be great if someone had a look at this and fixed it properly (if it really is an issue and not me making a mistake), I hope it was clear enough and if not, let me know!
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 19
Yes, it seems to be working now 😃
Yes, I also made a plot for a model with basically the same specification but instead of two categories in the group variable I had four and it also worked in that case 👍
This is great 😃 Thanks @tomicapretto!
Great, yes the plots look fine! Thanks again for the quick help and fix! 🥳
Thanks a lot for the quick reply! I simulated some data (
data.tsv) and am also sending the out-of-sample data I used for making predictions (dataNew.tsv). The figures show the spaghetti plot for when I’m using the original bambi code (prediction1.png) vs when using the code with the adaptation I’ve mentioned above (prediction2.png).data_and_plots.zip
Here’s how I built and fit the model:
Thanks again!