pyts: SAX-VSM, constant time series error

Description

When running SAX-VSM on my timeseries I get the following error: At least one sample is constant.

I tried filtering out all the constant time series with X = X[np.where(~(np.var(X, axis=1) == 0))[0]] to no avail

I tried fitting the model on 1 non-constant array and still got the error. I think that the issue is that this error is thrown when the SAX approximation would give the same symbol for the window, thus meaning that the window is constant. E.g. for a wordsize of 3 if the SAX transform would yield ‘aaa’ then this error appears. Could it be the case?

Steps/Code to Reproduce

<< 
from pyts.classification import SAXVSM

X_train = np.array([[1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2,2]])
y_train = np.array([1]) 

clf = SAXVSM(window_size=0.5, word_size=0.5, n_bins=3, strategy='normal')
clf.fit(X_train, y_train)
>>

Versions

NumPy 1.20.1 SciPy 1.6.1 Scikit-Learn 0.24.1 Numba 0.53.1 Pyts 0.11.0

Additionally an error: 'n_bins' must be greater than or equal to 2 and lower than or equal to min(word_size, 26)

If n_bins represents the alphabet size/the paa approximation then why should it be lower than the word size? Doesn’t that mean that situations like alphabet = {a,b,c,d,e} and wordsize = 3, are impossible? (which shouldn’t be the case)

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 46 (25 by maintainers)

Most upvoted comments

Thanks @johannfaouzi - I came across this error using 0.11 and installed from main (ff746ef) and it’s working for me!