sktime: [BUG] Incorrect calculation of the mean RMSSE metric
Describe the bug <– It seems to me that I found an error in the calculation of the RMSSE (mean_squared_scaled_error() function) metric when multi-outputting the result with a parameter multioutput=“uniform_average”. As I understand it, in your approach, each column is a separate time series to evaluate. Instead of calculating the average RMSSE over all the series, you calculate the average MSE of all the time series and divide by the average MSE of the naive algorithms of all the series in train period. In other words, instead of taking the average of all the quotients, you took the quotient of the averages. But this is not the same thing and, in my opinion, is wrong. –>
To Reproduce <– I’ll use your example from docs as an example. –>
y_train = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_true = np.array([[0.5, 1], [-1, 1], [7, -6]])
y_pred = np.array([[0, 2], [-1, 2], [8, -5]])
mean_squared_scaled_error(y_true, y_pred, y_train=y_train, multioutput='raw_values', square_root=True)
# array([0.11215443, 0.20203051])
mean_squared_scaled_error(y_true, y_pred, y_train=y_train, square_root=True)
# 0.15679361328058636
Expected behavior
(0.11215443 + 0.20203051) / 2
# 0.15709246999999998
Additional context <– Whether the option with the supply of weights using parameters works correctly, I did not check. Perhaps also not correct. –>
Versions 0.21.0
System: python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:28:38) [MSC v.1929 64 bit (AMD64)] executable: C:\Users.…\miniconda3\envs\ml_ts\python.exe machine: Windows-10-10.0.19045-SP0
Python dependencies: pip: 23.1.1 sktime: 0.21.0 sklearn: 1.2.2 skbase: 0.4.6 numpy: 1.23.5 scipy: 1.10.1 pandas: 2.0.0 matplotlib: 3.7.1 joblib: 1.2.0 statsmodels: 0.13.5 numba: 0.56.4 pmdarima: 2.0.2 tsfresh: 0.0.post0.dev58+ga42b9ea.dirty tensorflow: None tensorflow_probability: None
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 17
Honestly, I’m more interested in logic. Although, of course, having perfect documentation would also be nice. But the logic of calculation seems to me more important.
Below I will write another important point about this metric.
I want to make two points: