astropy: FittingWithOutlierRemoval doesn't work for model sets

Attempting to use FittingWithOutlierRemoval in place of a simple LinearLSQFitter instance produces a ValueError: Input argument u'x' does not have the correct dimensions in model_set_axis=0 for a model set with n_models=4176. This blocks the very common use case of fitting each row of an image with rejection, as in IRAF’s fit1d. One could, of course, fit each row with a separate model instance, but that would be much slower – and even with model sets, fitting already seems unpleasantly slow compared with IRAF, so it probably wouldn’t be viable.

Unfortunately, I think fixing this might be non-trivial: Within FittingWithOutlierRemoval, L557 (also L567-568) selects only good points from an input array of x co-ordinates, like so:

https://github.com/astropy/astropy/blob/667b7ab621dfa80290fbe36cd711ad0a427ca272/astropy/modeling/fitting.py#L556-L559

Obviously the rejected points will differ from one row/model to another, which means that FittingWithOutlierRemoval cannot simply pass the underlying Fitter a single array of x co-ordinates (in N-1 dimensions) to use for every model. Unlike model evaluation, I think fitting of model sets in general only works with a common x array for all the models, so LinearLSQFitter and friends may need substantial changes for this to work. Moreover, I believe the matrix equation solved by the underlying np.linalg routine assumes a 2D Vandermonde matrix, corresponding to a single set of points in x – I’m not even sure (off the top of my head) that the problem can be expressed in a form that the back-end solver can work with, which would be rather unfortunate.

Would you mind confirming that this is a correct analysis, @nden (assuming you know off hand)? I’ve always had a bit of a hard time discerning from the documentation exactly what dimensionality of inputs the models and fitters accept under different circumstances.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 17 (17 by maintainers)

Commits related to this issue

Re-write FittingWithOutlierRemoval to work for model sets (where the fitter supports masked input) & add a couple of tests [#6819]. — committed to jehturner/astropy by jehturner 6 years ago
Re-write FittingWithOutlierRemoval to work for model sets (where the fitter supports masked input) & add a few tests [#6819]. — committed to jehturner/astropy by jehturner 6 years ago
Re-write FittingWithOutlierRemoval to work for model sets (where the fitter supports masked input) & add a few tests [#6819]. — committed to jehturner/astropy by jehturner 6 years ago
Re-write FittingWithOutlierRemoval to work for model sets (where the fitter supports masked input) & add a few tests [#6819]. — committed to SaOgaz/astropy by jehturner 6 years ago
Re-write FittingWithOutlierRemoval to work for model sets (where the fitter supports masked input) & add a few tests [#6819]. — committed to astromancer/astropy by jehturner 6 years ago

Most upvoted comments

Not in the linear fitter. It solves a matrix equation where one of the terms is a Vandermonde matrix corresponding to a common set of points for all the models. If someone can point out an option I’m overlooking, great, but I don’t think so… (note that np.linalg complains if you try to give it a higher-dimensional matrix).

jehturner on Nov 23, 2017

I had just started having a go at this when my disk failed 🙁.

jehturner on Nov 17, 2017

Nearly half of the execution time is spent in np.lapack_lite.dgelsd, which looks quite a thin wrapper for the lapack routine, so there’s not much prospect for removing the order-of-magnitude overhead when looping over models without writing non-trivial Fortran extensions. I’m not sure why numpy.linalg.lstsq has to call dgelsd twice per invocation though (it doesn’t look accidental)… Update: ah, I see that’s just how DGELSD is documented to work; the first invocation doesn’t actually do the calculation.

jehturner on Nov 17, 2017