keras: Input and target format for multidimentional time-series regression

I’m trying to solve a problem I had intended for tensorflow in Keras.

I’ve gotten a lot further using Keras, but I’m still unclear on how best to represent my sequence data. The following code works quite well using only one input sample and one target sample:

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# This does work by using only one sample:
data = [[0,0,0,0,0,0,0,0,0,2,1]]
data = np.array(data, dtype=float)
target = [0,0,0,0,0,0,0,0,2,1,0]
target = np.array(target, dtype=float)

data = data.reshape((1, 1, 11)) # Single batch, 1 time steps, 11 dimentions
target = target.reshape((-1, 11)) # Corresponds to shape (None, 11)


# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(1, 11), unroll=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=1000, batch_size=1, verbose=2)

# Do the output values match the target values?
predict = model.predict(data)
print repr(data)
print repr(predict)

But I can’t get it to work with multiple samples:

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Input sequence
wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1],
                 [0,0,0,0,0,0,0,0,2,1,0],
                 [0,0,0,0,0,0,0,2,1,0,0],
                 [0,0,0,0,0,0,2,1,0,0,0],
                 [0,0,0,0,0,2,1,0,0,0,0],
                 [0,0,0,0,2,1,0,0,0,0,0],
                 [0,0,0,2,1,0,0,0,0,0,0],
                 [0,0,2,1,0,0,0,0,0,0,0],
                 [0,2,1,0,0,0,0,0,0,0,0],
                 [2,1,0,0,0,0,0,0,0,0,0]]

# Preprocess Data: (This does not work)
wholeSequence = np.array(wholeSequence, dtype=float) # Convert to NP array.
data = wholeSequence[:-1] # all but last
target = wholeSequence[1:] # all but first

# This does not work:
# Reshape training data for Keras LSTM model
# The training data needs to be (batchIndex, timeStepIndex, dimentionIndex)
data = data.reshape((1, 9, 11)) # Single batch, 9 time steps, 11 dimentions
target = target.reshape((-1, 11)) # Corresponds to shape (None, 11)


# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(9, 11), unroll=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=1000, batch_size=1, verbose=2)

# Do the output values match the target values?
predict = model.predict(data)
print repr(data)
print repr(predict)

Due to this error: ValueError: Input arrays should have the same number of samples as target arrays. Found 1 input samples and 9 target samples.

What am I doing wrong with array shape?

It would be really nice if Keras facilitates this use case such that a single data structure holds the sequence and the fitter would know that for each input X_t, the target is X_(t+1). This would provide some benefits such as the following:

There would be no redundancy in storing the data and targets separately.
One would not have to be concerned with the shape of the input and targets separately.

About this issue

Original URL
State: closed
Created 8 years ago
Comments: 36 (12 by maintainers)

Most upvoted comments

The first dimension of your data is the batch dimension. It will show up as None. It can be any size, as long as it is the same for your inputs and targets. When you’re dealing with LSTMs, the batch dimension is the number of sequences, not the length of the sequence.

LSTMs in Keras are typically used on 3d data (batch dimension, timesteps, features). LSTM without return_sequences will output (batch dimension, output features) LSTM with return_sequences will output (batch dimension, timesteps, output features)

So, if your input shape is (None, 9, 11) and your actual input shape is (1, 9, 11) that means your batch dimension is 1. If your output shape is (None, 11), then your actual targets need to be (1,11).

The loop you’re describing isn’t the way to do things in Keras. Run a single vector with shape (number of sequences, steps, features) to calculate the entire series in one go. That way errors can backpropagate through time.

Are you trying to do time-series prediction? I’m not sure what you’re trying to build.

Basic timeseries data has an input shape (number of sequences, steps, features). Target is (number of sequences, steps, targets). Use an LSTM with return_sequences.

Probably skip the dense layer for now until you have the basics working.

Cheers, Ben

+12

bstriner on Dec 31, 2016

@ss32 dimensions are (samples, timesteps, features). “Samples” are data that goes together so it doesn’t make any sense when you say you have 2 input samples and 1 output sample. Maybe you are saying you have 1 input sample (with 44 features for each timestep). Maybe you are saying you have 2 input samples and 2 output samples, but the 2 output samples are the same.

The data you’re describing is impossible, so please explain what you are trying to do with 2 input samples and 1 output sample. What is your data supposed to represent?

Number of input and output samples have to be the same because a “sample” is an input and output combination.

You might also just be confusing samples and features. 2 input features and 1 output feature is just input (1,2000,2) output (1,2000,1).

On a related note, please do not try to pass a 2000 length sequence to your LSTM. It will give you junk (unless you’re just looking to make an art project or something). Best strategy is to slice out many subsequences of maybe length 40 (something bigger than your expected time lag) so your shapes are (None, 40, 2) and (None, 40, 1). If you have very many possible different subsequences you will learn a model that works marginalized over all possible positions in the sequence. LSTM will get slow after maybe 10 steps, so 2000 is kind of silly, especially if you don’t think you have relationships with 1999 lag. Also, if you only have 1 sample, nothing will generalize. You can break that 1 sample into 1960 subsequences of length 40 and you might learn something more meaningful.

Cheers

bstriner on Jul 9, 2017

@galfaroi Ideally you should break out overlapping windows. Instead of one sequence of 1870, you could have many sequences of let’s say 20. Your sequences should be overlapping windows [0-20], [1-21], [2-22], etc, so your final shape would be something like (1850, 20, 14).

Same process for your test data. Break into subsequences of the same length as training.

You will have to play around with finding what a good subsequence length is.

It is extremely important to have many different ways of slicing your data. If you train on just one super long sequence it will probably not learn anything interesting.

Also, depending on what you decide to do, it may be better to generate subsequences as part of a generator function instead of doing it ahead of time. Check the keras docs for how to write a generator.

bstriner on Jun 26, 2017

If you have 5 days of data and look at 3 day windows, there are 3 windows (5-3+1). If you have 365 days and 14 day windows there should be 352 possible sequences. ignore the 341.

You want to make a 1 month window prediction based on what? The previous 11 months? Then your input should be 365 days of data. If you have 10 years of data that is 36510 possible sequences. So (36510, 365, 12).

If days of the week or which month is part of your model, then you can change things up. The point is that the first dimension is how many sequences, and you want as many as possible so the thing will generalize meaningfully.

As an example of what not to do, make a model that is (1, 365*10, 12). That will learn a function that outputs all 10 years of data, once. The problem is, it will only work on that 10 years of data. If you give it just 9 years of data it might give you something else entirely, because it has only seen one sequence. It might work well on training data but might be junk for other data, because there is in effect only one piece of training data.

So, get creative, and however you can slice things up to make multiple sequences should make a better model.

bstriner on Sep 1, 2017

LSTM is probably not going to learn that much over 365 timesteps. Also, you may not be looking for patterns that are 365 timesteps away.

You can shape your data (1,365,12) and run it in one go. The problem is that will probably not generalize or be very meaningful. You are learning a single function that predicts a single batch of data.

Ideally, decide something like 2 weeks is a reasonable amount of data to make a prediction. Reshape your data as (341, 14, 12). That is each 2-week subsequence. There will be repetition. However, you are now learning a function that works on every 2-week subsequence, so the function is more likely to generalize to other data.

Since this involves repetition, it may be more efficient to generate data on the fly using fit_generator. Randomly sample a bunch of starting points, then select those subsequences.

bstriner on Sep 1, 2017

@ss32 LSTM is the right layer to use. The problem is that LSTM learns an initial state. If it only ever starts in the same position, it will learn something that might only work from the same starting position. If you train it on different subsequences from different starting positions then it will try to learn something that will work starting from anywhere. This doesn’t require any changes to your keras model, but changes to your input and output shaping and preprocessing.

You have a few choices. Easiest option is to flatten everything. If the inputs are (n,depth,3,5) and (n,depth,6), reshape and concatentate into (n,depth,21) then use an LSTM as usual.

I don’t see where you are running into problems. Please explain what your data is so we can decide what shape it is supposed to be instead of the other way around.

Standard usage:

input sequences are (n, steps, input features)
output sequences are (n, steps, output features)
Run LSTM with return_sequences=True

If you have only one long sequence instead of short sequences, try breaking them up into reasonably sized chunks.

Cheers

bstriner on Jul 9, 2017

Following is my working code (with very low loss). Thanks for the help. I still think a fit function for large sequences that does auto random shuffled batches would be a nice feature, rather than storing redundant information.

import numpy as np

from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM

# Input sequence
wholeSequence = [[0,0,0,0,0,0,0,0,0,2,1],
                 [0,0,0,0,0,0,0,0,2,1,0],
                 [0,0,0,0,0,0,0,2,1,0,0],
                 [0,0,0,0,0,0,2,1,0,0,0],
                 [0,0,0,0,0,2,1,0,0,0,0],
                 [0,0,0,0,2,1,0,0,0,0,0],
                 [0,0,0,2,1,0,0,0,0,0,0],
                 [0,0,2,1,0,0,0,0,0,0,0],
                 [0,2,1,0,0,0,0,0,0,0,0],
                 [2,1,0,0,0,0,0,0,0,0,0]]

# Preprocess Data:
wholeSequence = np.array(wholeSequence, dtype=float) # Convert to NP array.
data = wholeSequence[:-1] # all but last
target = wholeSequence[1:] # all but first

# Reshape training data for Keras LSTM model
# The training data needs to be (batchIndex, timeStepIndex, dimentionIndex)
# Single batch, 9 time steps, 11 dimentions
data = data.reshape((1, 9, 11))
target = target.reshape((1, 9, 11))

# Build Model
model = Sequential()  
model.add(LSTM(11, input_shape=(9, 11), unroll=True, return_sequences=True))
model.add(Dense(11))
model.compile(loss='mean_absolute_error', optimizer='adam')
model.fit(data, target, nb_epoch=2000, batch_size=1, verbose=2)

bbogart on Jan 3, 2017

In your description I think you meant input is (48577, 96 , 7).

24 units means 24 dimensions in the hidden vector and output vectors in each timestep. Units and steps are unrelated.
You are applying a single matrix over each time step. The output of your lstm will be (n,96,24). It will apply a 24x1 linear layer. As a side note, due to the broadcasting rules, you should be able to use dense without the timedistributed and it should still work.
basically Another side note: I think time-locked dropout is becoming standard now if you have sequential data.

bstriner on Oct 25, 2017

@Tchaikovic did you have a dataset or a problem in mind? All LSTM in Keras are (samples, timesteps, features), so any LSTM is an example. The important thing when using them is to understand your dataset so you know what is what and do any necessary preprocessing or reshaping to make the data the right shape.

So, for example, lets say you want to train a language model on 6-word sequences, and you have a vector of words of length n. There are k unique words. First, one-hot encode the words (n,) -> (n, k). Then, roll and concatenate the array to itself (n,k) -> (n-5, 6, k). This array has every 6-word sequence in the data. Each word is encoded as k features (one-hot).

bstriner on Aug 25, 2017

If you want a prediction for each timestep, set return_sequences=True when creating the LSTM.

If you want to run the same dense model after each timestep after the LSTM, use TimeDistributed(Dense(11)).

Run the below so you can see if your shapes all line up:

model.summary()
print "Inputs: {}".format(model.input_shape)
print "Outputs: {}".format(model.output_shape)
print "Actual input: {}".format(data.shape)
print "Actual output: {}".format(target.shape)

Cheers, Ben

bstriner on Dec 30, 2016