sktime: [BUG] Incorrect calculation of the mean RMSSE metric

Describe the bug <– It seems to me that I found an error in the calculation of the RMSSE (mean_squared_scaled_error() function) metric when multi-outputting the result with a parameter multioutput=“uniform_average”. As I understand it, in your approach, each column is a separate time series to evaluate. Instead of calculating the average RMSSE over all the series, you calculate the average MSE of all the time series and divide by the average MSE of the naive algorithms of all the series in train period. In other words, instead of taking the average of all the quotients, you took the quotient of the averages. But this is not the same thing and, in my opinion, is wrong. –>

To Reproduce <– I’ll use your example from docs as an example. –>

y_train = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_true = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = np.array([[0, 2], [-1, 2], [8, -5]])
mean_squared_scaled_error(y_true, y_pred, y_train=y_train, multioutput='raw_values', square_root=True)
# array([0.11215443, 0.20203051])
mean_squared_scaled_error(y_true, y_pred, y_train=y_train,  square_root=True)
# 0.15679361328058636

Expected behavior

(0.11215443 + 0.20203051) / 2
# 0.15709246999999998

Additional context <– Whether the option with the supply of weights using parameters works correctly, I did not check. Perhaps also not correct. –>

Versions 0.21.0

System: python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:28:38) [MSC v.1929 64 bit (AMD64)] executable: C:\Users.…\miniconda3\envs\ml_ts\python.exe machine: Windows-10-10.0.19045-SP0

Python dependencies: pip: 23.1.1 sktime: 0.21.0 sklearn: 1.2.2 skbase: 0.4.6 numpy: 1.23.5 scipy: 1.10.1 pandas: 2.0.0 matplotlib: 3.7.1 joblib: 1.2.0 statsmodels: 0.13.5 numba: 0.56.4 pmdarima: 2.0.2 tsfresh: 0.0.post0.dev58+ga42b9ea.dirty tensorflow: None tensorflow_probability: None

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 17

Most upvoted comments

Honestly, I’m more interested in logic. Although, of course, having perfect documentation would also be nice. But the logic of calculation seems to me more important.

Below I will write another important point about this metric.

I want to make two points:

  1. The RMSSE metric gained particular prominence after the M5 competition 2020, in which it was chosen as the accuracy estimator. And in the description of the metric, it is quite clearly written: first, we calculate the RMSSE of each time series, and then apply the weights. In the case where the parameter is “uniform_average”, this can be considered as equal weights for all time series.
  2. The option with the additional “fraction_averaging” argument looks good.