pyGAM: PoissonGAM fails with dimension mismatch warning depending on n_splines

Using a grid search and several options for n_splines, some fits fail due to a dimension mismatch.

gam = PoissonGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))

....

  return (mu**y) * np.exp(-mu) / sp.misc.factorial(y)
 50% (3 of 6) |#############              | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (120,240) and (239,120) not aligned: 240 (dim 1) != 239 (dim 0)
on model:
PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
   constraints=None, dtype='numerical', fit_intercept=True, 
   fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
   n_splines=7, penalties='auto', spline_order=3, tol=0.0001)
skipping...

  warnings.warn(msg)
 66% (4 of 6) |##################         | Elapsed Time: 0:00:00 ETA:  0:00:00/usr/local/lib/python3.6/site-packages/pygam/pygam.py:1888: UserWarning: shapes (137,260) and (259,123) not aligned: 260 (dim 1) != 259 (dim 0)
on model:
PoissonGAM(callbacks=[Deviance(), Diffs(), Accuracy()], 
   constraints=None, dtype='numerical', fit_intercept=True, 
   fit_linear=False, fit_splines=True, lam=0.6, max_iter=100, 
   n_splines=8, penalties='auto', spline_order=3, tol=0.0001)
skipping...

...

Training a LinearGAM model using the same dataset and grid search options does not give rise to the same error.

gam = LinearGAM(dtype='numerical').gridsearch(X, y, n_splines=np.arange(4,10))
100% (6 of 6) |###########################| Elapsed Time: 0:00:00 Time: 0:00:00

Does this occur because there are more coefficients than data? If so, a more informative warning would be helpful.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 2
  • Comments: 18 (11 by maintainers)

Most upvoted comments

ive put the depracation warning in a separate issue.

@maxpagels thanks for finding this bug.

i believe i’ve replicated the error locally, and it looks like it is occurring because of some poor book-keeping of the matrix shapes during the optimization loop.

(not because n_coef > n_samples)

i want to get a fix out this week.

Versions tested: 0.4.0 and 0.3.0.