stan: generated quantities block returns zeroes for ALL quantities if there's a problem with one
Summary:
All generated quantities in block are set to zero if there is a problem with estimating one quantity, even when they are unrelated.
Description:
If there’s an impossible calculation (e.g. something that violates stated constraints/enough to throw a warning) on any generated quantity then zeroes are returned for all quantities, even those which have unrelated (conditional on model) calculation. This feels undesirable as zeroes have meaning and it seems odd to lose all output due to a mistake in a single calculation.
Reproducible Steps:
Obviously these are not sensible generated quantities but when I run this theta2 and thetaimp are both returned as zeroes, thetaimp is impossible but even though theta2 is a simple calculation and unrelated, both are returned as zeroes.
data {
int<lower=0> N;
int<lower=0,upper=1> y[N];
}
parameters {
real<lower=0,upper=1> theta;
}
model {
theta ~ beta(1,1); // uniform prior on interval 0,1
y ~ bernoulli(theta);
}
generated quantities {
real<lower=0,upper=1> theta2;
real<lower=0,upper=1> thetaimp;
theta2 = theta^2;
thetaimp = theta/0;
}
Current Output:
Model summary:
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ -7.29 -7.01 0.745 0.358 -8.79 -6.75 1.00 1685.
2 theta 0.252 0.234 0.122 0.125 0.0824 0.478 1.00 1316.
3 theta2 0 0 0 0 0 0 NA NA
4 thetaimp 0 0 0 0 0 0 NA NA
Expected Output:
Model summary:
variable mean median sd mad q5 q95 rhat ess_bulk
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 lp__ -7.29 -7.01 0.745 0.358 -8.79 -6.75 1.00 1685.
2 theta 0.252 0.234 0.122 0.125 0.0824 0.478 1.00 1316.
3 theta2 0.0784 0.0546 0.0739 0.0536 0.00678 0.228 1.00 1316
4 thetaimp NA NA NA NA NA NA NA NA
Additional Information:
I’ve tried this using cmdstanr on Mac and Ubuntu. I’d hoped to do some more exploration to be more informative but ran out of time. Incidentally, having a constrained variable declared in generated quantities but not used also throws warnings which can be frustrating when debugging (e.g. to find out which line of your code is sending everything else to zeroes)
Current Version:
cmdstan version 2.28.2 (Mac) and (2.29.2) cmdstanr 0.3.0
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (15 by maintainers)
My opinion is that returning it as NaN is better than the current behaviour of returning zeros as that could be misleading as you noted above.
I think your final request (having subsequent but unrelated variables not be NaN) would be nice to have but my understanding is that would be significantly more complicated. The way the checks and everything work (even with Niko’s changes?) mean that the NaNs wouldn’t propagate in the same way you’d expect automatically
Just want to check that we are all on the same page.
We always want
write_arrayto always return the parameters, and then if any of the transformed parameters or generated quantities fails we wantwrite_arrayto return NaN for all those transformed parameters or generated quantities. Is that correct?