keras: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4
Here’s the code I’ve written:
model.add(LSTM(150,
input_shape=(64, 7, 339),
return_sequences=False))
model.add(Dropout(0.2))
model.add(LSTM(
200,
return_sequences=True))
model.add(Dropout(0.2))
model.add(LSTM(
150,
return_sequences=True))
model.add(Dropout(0.2))
model.add(Dense(
output_dim=1))
model.add(Activation('sigmoid'))
start = time.time()
model.compile(loss='mse', optimizer='rmsprop')
print('compilation time : ', time.time() - start)
model.fit(
trainX,
trainY_Buy,
batch_size=64,
nb_epoch=10,
verbose=1,
validation_split=0.05)
the error i’m getting is this: ValueError: Input 0 is incompatible with layer lstm_1: expected ndim=3, found ndim=4 on this line: model.add(LSTM(150, input_shape=(64, 7, 339), return_sequences=False))
my X shape is: (492, 7, 339) my Y shape is: (492,)
anyone have any ideas on what I’m doing wrong?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 1
- Comments: 36 (4 by maintainers)
@ajanaliz . You may need to turn “return_sequences=True” in the first layer. Maybe that will solve it. I hope that works. Thanks.
@ajanaliz . I took a quick look, and I believe that you need to remove the leading “64” from the input shape of the LSTM layer --> input_shape=(64, 7, 339), --> input_shape=(7, 339). Keras’ convention is that the batch dimension (number of examples (not the same as timesteps)) is typically omitted in the input_shape arguments. The batching (number of examples per batch) is handled in the fit call. I hope that helps. Thanks.
@hadisaadat . Some suggestions that might help provide some direction for you. I would suggest breaking up your problem into several pieces to make sure you understanding the input/output dimensions of each layer (reviewing this section will help https://keras.io/layers/recurrent/). Also, you have “batch_size” as the first argument of the LSTM call:
model.add(LSTM(batch_size, input_shape=(Max_word_len, Char_embedding_Size), return_sequences=False))
The first argument (which is called “units” in the documentation) is the output dimensionality of that layer. In order words, you will have “units” number of LSTM cells at that layer. In Keras, the batch dimension is not typically specified in the model architecture.
It might be useful to just have a single LSTM layer that feeds into a single Dense layer (with number of units=1, so Dense(1), not Dense(62)) and look at the architecture you get from model.summary()).
Also, the “return_sequences=True” for each sample, will generate output from each LSTM cell for each timestep. This is typically fed into a second RNN layer, not into a regular Dense layer. Also, Keras should automatically infer the input data shape for every layer except the first.
If you are trying to processing characters -> words, and then words --> sentences, it might be easier to create two separate models (each one with a single LSTM layer) instead of directly trying to stack, because your batch definition is changing between layers.
Also, as far as the final layer goes, that will depend on what your target is? In other words, what do you want your model to output. Are you trying to classify the type of sentence? From your above architecture, it seems you have 62 different outputs/classes you are trying to model. Depending on what your target is, this should help you decide on the final output layer.
I hope this gives you a few ideas to help. Thanks.
Before and after an LSTM layer you need an input and output layer
model = Sequential() model.add(Dense(215, input_shape=(train_x.shape[1], train_x.shape[2]))) model.add(LSTM(100, return_sequences=‘true’)) model.add(Dense(1, activation=‘softmax’))
I would try increasing the network:
@hadisaadat . Here is a bit more information which might help, based on your info above:
The basic layout is the following:
Batch_sample_1: timestep1, timestep2, …, timestepN Batch_sample_2: timestep1, timestep2, …, timestepN … Batch_sample_BatchSize: timestep1, timestep2, …, timestepN
When you set return_sequences=True, then for each batch_sample, you get outputs for each time step (And this will be one output for each LSTM cell in that layer).
When you set return_sequences=False, then for each batch_sample, you get the output for timestepN only.
In the second layer, since you are doing input and output for each word (which is from layer1), return_sequences=False is what you want I think (you only want the output from each input word, not the character that forms the word. For each sequence of timestep1…timestepN, you only have one word predicted).
Some more ideas based on using Masking and Merge Layers that might suggest some direction:
The idea is that if you can set a value (say 0.0) to be an “ignore” using the mask, then the second LSTM will only process on the final output, I think. More details here:https://keras.io/layers/core/
I hope this helps. Thanks.
Hi, I have the same bug as you. My X shape is (24443, 124, 30), y shape is (24443, 124). May be it’s the shape of y that causes the error for me. May I know the type of your Y?
@td2014 nope, that way my error is: Input 0 is incompatible with layer lstm_2: expected ndim=3, found ndim=2
same thing happens when I wrote the following for the first LSTM layer: