neuralforecast: [Core] Getting error when doing predict_insample

What happened + What you expected to happen

Hello, When I am doing insample forecast, I am getting this error. Exception: test_size - h should be module step_size

Exception                                 Traceback (most recent call last)
File <command-32941315092027>, line 1
----> 1 Y_hat_insample = nf.predict_insample(step_size = horizon)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f5388d12-693f-46a8-9203-d411f41d9a38/lib/python3.10/site-packages/neuralforecast/core.py:601, in NeuralForecast.predict_insample(self, step_size)
    597 # Generate dates
    598 len_series = np.diff(
    599     trimmed_dataset.indptr
    600 )  # Computes the length of each time series based on indptr
--> 601 fcsts_df = _insample_dates(
    602     uids=self.uids,
    603     last_dates=last_dates_train,
    604     freq=self.freq,
    605     h=self.h,
    606     len_series=len_series,
    607     step_size=step_size,
    608 )
    609 fcsts_df = fcsts_df.set_index("unique_id")
    611 col_idx = 0

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f5388d12-693f-46a8-9203-d411f41d9a38/lib/python3.10/site-packages/neuralforecast/core.py:87, in _insample_dates(uids, last_dates, freq, h, len_series, step_size)
     81 """
     82 Generate insample dates for `predict_insample` function. Uses `_cv_dates`
     83 method with separate sizes and last dates for each series.
     84 """
     85 if (len(np.unique(last_dates)) == 1) and (len(np.unique(len_series)) == 1):
     86     # Dates can be generated simulatenously if ld and ls are the same for all series
---> 87     dates = _cv_dates(last_dates, freq, h, len_series[0], step_size)
     88     dates["unique_id"] = np.repeat(uids, len(dates) // len(uids))
     89 else:

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-f5388d12-693f-46a8-9203-d411f41d9a38/lib/python3.10/site-packages/neuralforecast/core.py:44, in _cv_dates(last_dates, freq, h, test_size, step_size)
     41 def _cv_dates(last_dates, freq, h, test_size, step_size=1):
     42     # assuming step_size = 1
     43     if (test_size - h) % step_size:
---> 44         raise Exception("`test_size - h` should be module `step_size`")
     45     n_windows = int((test_size - h) / step_size) + 1
     46     if len(np.unique(last_dates)) == 1:

Exception: `test_size - h` should be module `step_size`

Versions / Dependencies

1.6.4

Reproduction script

from neuralforecast.losses.pytorch import RMSE
from neuralforecast.losses.pytorch import  HuberMQLoss, DistributionLoss
from neuralforecast import NeuralForecast
from neuralforecast.auto import TimesNet, AutoNHITS, AutoLSTM, AutoRNN
from neuralforecast.models import Informer, Autoformer, FEDformer, PatchTST
from neuralforecast.models import NHITS, GRU
quantiles = [0.5]
horizon = 13
nf = NeuralForecast(
    models= [
   
           NHITS(h=horizon,
              input_size=2*horizon,
              loss= HuberMQLoss(quantiles=quantiles), 
              dropout_prob_theta = 0.6,  # dropout to robustify vs outlier lag inputs 
             #stat_exog_list=['airline1'],
              n_freq_downsample=[2, 1, 1],
              scaler_type='robust',
          #    alias = 'NHITS',
              max_steps=200,
             # early_stop_patience_steps=2,
              inference_windows_batch_size=1,
             # val_check_steps=10,
              learning_rate=1e-3), 
   
          GRU(h=horizon,input_size=-1,
                loss=RMSE(),
                scaler_type='robust',
                encoder_n_layers=2,
                encoder_hidden_size=128,
                context_size=10,
                decoder_hidden_size=128,
                decoder_layers=2,
                max_steps=200,
                )
          
         
    ],
    freq= '4W-SAT'
    
)



nf.fit(train_df) # train_df) # _subset) 
preds_nf_df = nf.predict()

## Upto here works fine 

## From here getting error 
Y_hat_insample = nf.predict_insample(step_size = horizon)

Issue Severity

None

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 18 (7 by maintainers)

Most upvoted comments

The issue is that the predict insample internally sets test_size=series_length-true_test_size (where true_test_size is the one you defined), because it is forecasting the training data. This internal test_size should satisfy the condition. Is it clear? You need to trim the df dataset. We will fix this soon because it is confusing.

cchallu on Jan 21, 2024