bambi: `Model.predict()` generates unexpected out-of-sample predictions for a mixed effects model

Hello,

first of all, thanks for the great work on this project, I’ve been using bambi a lot and it has been super helpful!

I’m currently facing a (potential) issue when trying to make out-of-sample predictions for a logistic regression model built with the following formula:

y ~ x1 + x2 + x3 + (0 + x2|x1) + (0 + x3|x1)

where x1 and x2 are categorical variables with two dimensions respectively and x3 is a continuous variable.

The out-of-sample data I’m trying to make predictions for looks like this (exemplary):

x1	x2	x3
0	0	0
0	0	0.5
0	0	1
0	0	1.5
1	0	0
1	0	0.5
1	0	1
1	0	1.5

There was no error when running model.predict(iData, data=out_of_sample_data, kind='mean'), however the spaghetti plot I generated from the posterior predictions looked off for when x1==1, the variance was much bigger than I expected. (I noticed this because I had manually made a plot displaying the 0.5 decision boundary for x3, i.e. the mean value and hdi intervals of x3 where there is a 50% probability of a positive outcome and that didn’t match what I saw in the spaghetti plot.)

I then had a look at the code and noticed that the Z matrix generated in the predict method in models.py looked different from what I expected. Here’s the code bit I’m referring to (last line):

        if self._design.group:
            if in_sample:
                Z = self._design.group.design_matrix
            else:
                Z = self._design.group._evaluate_new_data(data).design_matrix

What I got for Z was the following:


1	0	0	0
1	0	0.5	0
1	0	1	0
1	0	1.5	0
0	1	0	0
0	1	0	0.5
0	1	0	1
0	1	0	1.5

…but what I was expecting (after trying to make sense of it) was this:


1	0	0	0
1	0	0.5	0
1	0	1	0
1	0	1.5	0
0	1	0	0
0	1	0	0.5
0	1	0	1
0	1	0	1.5

so basically the second and third column swapped. I then added the following line:

Z[:, [1, 2]] = Z[:, [2, 1]]

to achieve that and the spaghetti plot I then generated matched my expectation.

It would be great if someone had a look at this and fixed it properly (if it really is an issue and not me making a mistake), I hope it was clear enough and if not, let me know!

About this issue

Original URL
State: closed
Created 2 years ago
Reactions: 1
Comments: 19

Most upvoted comments

Yes, it seems to be working now 😃

LeonieMei on Apr 26, 2022

Yes, I also made a plot for a model with basically the same specification but instead of two categories in the group variable I had four and it also worked in that case 👍

LeonieMei on Apr 4, 2022

This is great 😃 Thanks @tomicapretto!

terrycojones on Apr 4, 2022

Great, yes the plots look fine! Thanks again for the quick help and fix! 🥳

LeonieMei on Apr 4, 2022

Thanks a lot for the quick reply! I simulated some data (data.tsv) and am also sending the out-of-sample data I used for making predictions (dataNew.tsv). The figures show the spaghetti plot for when I’m using the original bambi code (prediction1.png) vs when using the code with the adaptation I’ve mentioned above (prediction2.png).

data_and_plots.zip

Here’s how I built and fit the model:

formula = 'y ~ x1 + x2 + x3 + (0 + x2|x1) + (0 + x3|x1)'
priors = {'x2|x1': bmb.Prior('Normal', mu=0, sigma=bmb.Prior('HalfNormal', sigma=1)),
          'x3|x1': bmb.Prior('Normal', mu=0, sigma=bmb.Prior('HalfNormal', sigma=1)),
          'x3': bmb.Prior('Lognormal', mu=1, sigma=1),}
model = bmb.Model(data=df, formula=formula, family='bernoulli', categorical=['x1', 'x2'], noncentered=True,
                  priors=priors)
model.build()
iData = model.fit(target_accept=0.95)

Thanks again!

LeonieMei on Mar 24, 2022