keras: Spring 2017 roadmap: Keras 2, PR freeze, TF integration
Hi all,
Some news.
PR freeze
We are preparing the release of Keras 2, as well as the integration of the Keras API directly into the TensorFlow repository. Subsequently, we are declaring a PR freeze on Keras, to be lifted after the release of Keras 2. This means that no further PR to Keras 1 will be merged (or even reviewed). However, PRs to the Keras 2 branch (when it becomes available) are welcome.
Keras 2
We plan on making available a Keras 2 branch in the next few days, with a final release in the next few weeks.
Keras 2 will consist in some refactoring, a lot of API changes, and few functionality changes. There are many places in which the Keras 1 API was not optimal, differed from industry standards such as those set by TensorFlow or Numpy, or could otherwise be improved. We bundle API changes in a single release, so that users will only have to update their code once and for all.
- API changes between Keras 1 and Keras 2 will be made backwards compatible as much as possible, i.e. your Keras 1 code should still run with Keras 2. The Keras 1 API will be deprecated, and Keras 1 code running with Keras 2 will output deprecation warnings that will instruct users on how to update their code, line by line. Note that backwards compatibility will not be total, and advanced users (e.g. people who write their own layers) may see their code break.
- We will release complete notes covering all changes made and how to update a Keras 1 codebase to Keras 2.
- API changes after Keras 2 will be rare and limited in impact (the goal is have almost none). Keras 2 is a “long-term support” API, the first in Keras. Codebases written in Keras 2 next month should still run many years from now, on up-to-date software.
- In the medium term, we will write down the Keras API as the “Keras spec”, and we will set up a “Keras committee” to overview changes to the Keras spec. Indeed, Keras is no longer a library, but rather a spec with different available implementations. Changes to this spec need to be centralized (before being replicated across all implementations) and trusted to an authority that will carefully review all proposed changes. This also ensures that there will be few changes and that all changes will have a strong rationale.
- New, bleeding-edge functionality should preferably go to Keras contrib.
TF integration
The Keras 2 API will become part of the TensorFlow repository, to serve as a high-level API for TensorFlow. Concretely:
- We are bringing a TF-only, independent implementation of the Keras spec into TF, first in
tf.contrib
, later intf.keras
. - This implementation will increasingly be based off of core TF primitives (e.g. TF core layers and Keras layers will be the same objects), making code built using
tf.keras
deeply compatible with other TF functionality. You will be able to mix and match core TF andtf.keras
functionality seamlessly (in effect,tf.keras
is just a TF API, not a separate library). Likewise, you should be able to use Keras models with e.g. TFExperiments
, allowing you to easily train a Keras model in a distributed setting or on CloudML, or do distributed hyperparameter search. By usingtf.keras
, you will benefit from the full power of TensorFlow. - This integration does not affect the repository
fchollet/keras
. It continues to be the “home” of Keras, and Theano support will continue indefinitely. We are not replacing what is already there, rather, we are simply adopting the Keras spec as a built-in high-level API for TF. - Additionally, Microsoft is building a CNTK backend for Keras. In general, you should expect support for more backends in the future, not less. The goal is to have the Keras spec serve as a cross-platform front-end layer for deep learning, allowing compatibility of codebases and saved models across different backend engines. The more implementations the merrier.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 119
- Comments: 46 (36 by maintainers)
No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.
@fchollet here is a list of the masking requests I can think of right now. I might add more later:
Embedding
layer should work for higher-order inputs. Imagine a sentence represented as characters, for instance, and you want to embed each of the characters and then run a character-level encoder over each of the words. Your input would be(batch_size, num_words, num_characters_per_word)
.Embedding
doesn’t currently work correctly with this input. There are lots of similar situations where you have higher-order word or character input, and none of them work correctly without modifyingEmbedding
.TimeDistributed
needs to pass the mask through to the layer that it wraps. Additionally, there should be several subclasses available for handlingcompute_mask
in different ways. For example, imagine the sentence representation from above. If I want toTimeDistribute
a CNN encoder, applying it to each of the words, the mask I want to compute is basicallyK.any
on the mask for each timestep, so that the output mask tells me which whole words were masked. If I then want to take those word representations and pass them through aHighway
layer, I need toTimeDistribute
theHighway
layer over the number of words, because the tensor is(batch_size, num_words, encoding_dim)
. In this case, I wantTimeDistributed
to just pass through the mask. In still other cases, I might want to pass the computation ofcompute_mask
to the wrapped layer, and join them afterwards. It’s possible that you could capture all three of these use cases with just the last one, but it would probably take some complex logic to do so, in addition to modifying the behavior ofcompute_mask
in wrapped layers (e.g.,LSTM
doesn’t currently return a mask at all in thereturn_sequences=False
case, and it would need to return either a 0 or a 1 for this to work).K.softmax
that takes a mask as input is needed. Any time you want to compute a softmax over something that’s padded, you need this. The most obvious use case is attentions over word sequences, but there are others, too. You could solve this by adding another backend function, or just by adding aSoftmax
layer that handles masking (which will in the end also need another backend function, or just its own code that uses backend functions).Lambda
layer should support masking, as @braingineer said above.K.batch_dot
. If you want to implement bidirectional attention flow, you need to compute a similarity matrix that then gets passed through a couple of different softmaxes. As I already said above, the softmax needs to treat a mask correctly, so the operation that you did to compute the similarity matrix needs to propagate a correct mask (or you have to create one huge function, which prohibits re-using the similarity matrix in several downstream layers). So, we need aK.batch_dot
that propagates a mask. Similar to what I said forK.softmax
, you could either do this with another backend function, or you just add aBatchedDot
layer that handles the mask correctly. In general, it seems useful to have layers associated with most backend functions that do the correct masking computation (this may not be necessary for all of them, especially if theLambda
layer supports masking and passes through the mask by default).Model
, and in what situations you might want to use a mask.K.int_shape()
on masks. This is not the case in the theano backend.We have solutions for a lot of these problems in our codebase that we can contribute, though it’s all based on Keras 1.*, and I’m not sure how much will change in Keras 2. Either way, I’m happy to help contribute to fixing these issues. I would really like to see Keras succeed in being great for NLP.
@fchollet it’s just a plea to please take masking very seriously when thinking about the Keras 2.0 spec. It’s crucial for complex NLP, and some pretty basic building blocks of NLP models in Keras don’t support masking correctly (e.g., the
Embedding
layer, and theTimeDistributed
layer, as pointed out in PRs I’ve already linked to). Additionally, almost none of the backend operations deal with masks. This is fine in some cases, but if you want to compute a softmax with a mask, for instance, you have to write your own code. This makes doing attentions over padded word sequences hard, and probably most implementations of attention in Keras are wrong because of this - if you apply the mask after a softmax, as done in this re-implementation of a popular paper, it’s wrong, because your distribution wasn’t normalized correctly, and it’s not obvious that it’s wrong from looking at the code.There’s also very little documentation about masking. It’s in the background and easy to forget about. But you can’t forget about it when doing NLP, or you’re doing it wrong. It really needs to be treated as a fundamental component to any static computation graph applied to NLP tasks. The difficulty here is why people choose DyNet over Keras for NLP. There’s a whole lot to like about Keras - it’d be nice if were also really good for NLP.
Any chance that masking can get first-class support in the Keras 2.0 spec? Building complex NLP models with Keras is difficult and bug-prone, because masking is not supported very well. We’ve had to write a lot of custom layers and override default Keras layers in order to get them to handle masking correctly.
make graph visualization in TensorBoard great again, please! This is a feature request I honestly don’t know how to solve myself. I find Keras to make the graph tab on tensorboard hard to read
Yes, we’ve submitted some:
https://github.com/fchollet/keras/pull/3218 https://github.com/fchollet/keras/pull/4253 https://github.com/fchollet/keras/pull/4258
But getting no response after trying to submit improvements is pretty demoralizing for submitting future PRs, so we started just overriding Keras layers in our code (e.g., here, a really trivial fix to
Highway
that makes it work with masking, that wasn’t included because masking is an afterthought in the current Keras API).Will this PR freeze affect docstring improvements?
Also with the release of Keras 2 would it be a good idea to greatly reduce the number of tickets and implement a system/process that prevents or redirects general debugging questions to gitter, slack channel, or stackoverflow? From what I’ve seen most of the issues on this repo are implementation clarifications, general deep learning questions, and debugging help.
As for the keras spec when it is released, will there be a list of TODOs where the community can contribute? I’m very excited!
I have another general plea. If Keras 2 will become part of TF, can we please have a replication of the TF layers as keras 2 ones.
For instance, it has been months before any attention was given to #4457 (Deconvolution3D/Conv3DTranspose), albeit it being part of the layers supported by TF for a while (and used by anyone doing any 3D networks). Or somehow feature replicate any of the layers that are 1 and 2D (which by looking at the documentation is effectively the only such layer lacking).
@farizrahman4u Is Keras 2 ready now?
I am mainly looking forward to the
fit_distributed()
which could automatically use multiple GPUs promised by @fchollet months ago : )Exciting! Do you need any help with the porting?
It would be nice to have a rough criteria and perhaps a few examples on what should go to contrib and what should go to Keras proper.
Yes, it will be moved to Keras organization in the future. If any of the code breaks when Keras 2 is launched, it will be fixed by the maintainers. Else, each of the source files will be converted to the latest API passively.
Hi there. Couple of questions:
@fchollet is there any guide describing the API changes, namely “what’s changed”, “what will be deprecated”, “what will stay unchanged”… and so on? If not, is there any plan to do so? I would be happy to help/contribute to the documentation about this - imho it would really help the transition.
Is the
keras-contrib
repo mentioned and referenced in this conversation (by @farizrahman4u) the official one? Is there any plan to integrate it as a Keras Branch/module once Keras 2.0 will be released? I’m asking this also because I probably spotted a couple of cases in which Keras 1.X API have been used…Cheers
Hi @fchollet, I’ve just written a prototype for Keras using a Deeplearning4j backend. After completing this experiment, I’ve learned a lot about the design of Keras and pluggability of the framework.
Since a rewrite is already on the table, I am wondering if there are plans to make the backend more modular? In other words, do you have plans for a backend to handle more of the actual execution and give more granular control?
For example, Deeplearning4j runs in the JVM and bridges with the ND4J binary. In some cases, it is more advantageous and performant for DL4J to directly handle most of what happens for a
fit()
orevaluate()
operation. This is partly to avoid creating dual references in Python and the JVM (using py4j to bridge the two environments).The idea is that Keras is a more advanced “string argument” generator that creates a standard for model config and instructing the backend on what to execute. The DL4J experiment has already done this at a core level, and I believe there are some performance gains to be made.
Exciting! This is really a big news. I hope I can mak contributions to Keras 2! BTW, here are some possible adjustment in Keras 2.
Looking forward to the age of Keras 2!
Concretely, if I want to use the Tensorflow backend, is it better to
import keras
(“Keras is no longer a library, but rather a spec”),import tensorflow.contrib.keras
(marked as deprecated but still in the docs) orimport tensorflow.keras
(not documented)? Confused 😕In addition to the comments of @ParthaEth the same is true for reinforcement learning problems, loading images via tensorflow tensors #5356, and semantic segmentation seems to be second class. One example is keras-rl.
I don’t expect Keras to handle every possible design and problem, but I think it is important to point out areas of weakness before LTS API scoping decisions are settled so the appropriate choices can be made explicitly.
I would also like to draw attention to the fact that building custom RNN architecture is next to impossible in Keras without nasty hacks. Please have a look at this discussion for details. Because of this reason people are forced to create repositories like RecurrentShop. It would be nice to have some official attention on making life easier for RNN researchers.
I have personally felt that Keras leans more towards image stuff rather than nlp. I can’t pin point why exactly I “feel” so; limited support for masking is definitely one of the factors…