keras: slow training of LSTM

Hi, I just started using keras. Awesome work! I tried to use LSTM with the following code

model = Sequential()
model.add(LSTM(4096, 512, return_sequences=True))                                                                                                                
model.add(TimeDistributedDense(512, 4096))
model.add(Activation('time_distributed_softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop')

And I compared the efficiency with char-rnn, and found that the implementation in keras is about 4 times slower than Karpathy’s (with the same batchsize). Am I doing something wrong? I’ve attached the theano profile result Thanks you!

About this issue

  • Original URL
  • State: closed
  • Created 9 years ago
  • Comments: 16 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Training time is heavily dependent on network size and batch size. Is it even the same network (size included) at all?

Also, time_distributed_softmax is deprecated now, use softmax instead.