stan: generated quantities block returns zeroes for ALL quantities if there's a problem with one

Summary:

All generated quantities in block are set to zero if there is a problem with estimating one quantity, even when they are unrelated.

Description:

If there’s an impossible calculation (e.g. something that violates stated constraints/enough to throw a warning) on any generated quantity then zeroes are returned for all quantities, even those which have unrelated (conditional on model) calculation. This feels undesirable as zeroes have meaning and it seems odd to lose all output due to a mistake in a single calculation.

Reproducible Steps:

Obviously these are not sensible generated quantities but when I run this theta2 and thetaimp are both returned as zeroes, thetaimp is impossible but even though theta2 is a simple calculation and unrelated, both are returned as zeroes.

data {  
  int<lower=0> N;  
  int<lower=0,upper=1> y[N];  
}  
parameters {  
  real<lower=0,upper=1> theta;  
}  
model {  
  theta ~ beta(1,1);  // uniform prior on interval 0,1  
  y ~ bernoulli(theta);  
}  
generated quantities {  
  real<lower=0,upper=1> theta2;  
  real<lower=0,upper=1> thetaimp;  
  theta2 = theta^2;  
  thetaimp = theta/0;  
}  

Current Output:

Model summary:

variable   mean median    sd   mad      q5    q95  rhat ess_bulk
  <chr>     <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>    <dbl>
1 lp__     -7.29  -7.01  0.745 0.358 -8.79   -6.75   1.00    1685.
2 theta     0.252  0.234 0.122 0.125  0.0824  0.478  1.00    1316.
3 theta2    0      0     0     0      0       0     NA         NA 
4 thetaimp  0      0     0     0      0       0     NA         NA

Expected Output:

Model summary:

variable   mean median    sd   mad      q5    q95  rhat ess_bulk
  <chr>     <dbl>  <dbl> <dbl> <dbl>   <dbl>  <dbl> <dbl>    <dbl>
1 lp__     -7.29  -7.01  0.745 0.358 -8.79   -6.75   1.00    1685.
2 theta     0.252  0.234 0.122 0.125  0.0824  0.478  1.00    1316.
3 theta2    0.0784  0.0546 0.0739 0.0536  0.00678  0.228  1.00    1316
4 thetaimp  NA      NA     NA     NA      NA       NA     NA         NA

Additional Information:

I’ve tried this using cmdstanr on Mac and Ubuntu. I’d hoped to do some more exploration to be more informative but ran out of time. Incidentally, having a constrained variable declared in generated quantities but not used also throws warnings which can be frustrating when debugging (e.g. to find out which line of your code is sending everything else to zeroes)

Current Version:

cmdstan version 2.28.2 (Mac) and (2.29.2) cmdstanr 0.3.0

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (15 by maintainers)

Most upvoted comments

My opinion is that returning it as NaN is better than the current behaviour of returning zeros as that could be misleading as you noted above.

I think your final request (having subsequent but unrelated variables not be NaN) would be nice to have but my understanding is that would be significantly more complicated. The way the checks and everything work (even with Niko’s changes?) mean that the NaNs wouldn’t propagate in the same way you’d expect automatically

Just want to check that we are all on the same page.

We always want write_array to always return the parameters, and then if any of the transformed parameters or generated quantities fails we want write_array to return NaN for all those transformed parameters or generated quantities. Is that correct?