keras: Spring 2017 roadmap: Keras 2, PR freeze, TF integration

Hi all,

Some news.

PR freeze

We are preparing the release of Keras 2, as well as the integration of the Keras API directly into the TensorFlow repository. Subsequently, we are declaring a PR freeze on Keras, to be lifted after the release of Keras 2. This means that no further PR to Keras 1 will be merged (or even reviewed). However, PRs to the Keras 2 branch (when it becomes available) are welcome.

Keras 2

We plan on making available a Keras 2 branch in the next few days, with a final release in the next few weeks.

Keras 2 will consist in some refactoring, a lot of API changes, and few functionality changes. There are many places in which the Keras 1 API was not optimal, differed from industry standards such as those set by TensorFlow or Numpy, or could otherwise be improved. We bundle API changes in a single release, so that users will only have to update their code once and for all.

  • API changes between Keras 1 and Keras 2 will be made backwards compatible as much as possible, i.e. your Keras 1 code should still run with Keras 2. The Keras 1 API will be deprecated, and Keras 1 code running with Keras 2 will output deprecation warnings that will instruct users on how to update their code, line by line. Note that backwards compatibility will not be total, and advanced users (e.g. people who write their own layers) may see their code break.
  • We will release complete notes covering all changes made and how to update a Keras 1 codebase to Keras 2.
  • API changes after Keras 2 will be rare and limited in impact (the goal is have almost none). Keras 2 is a “long-term support” API, the first in Keras. Codebases written in Keras 2 next month should still run many years from now, on up-to-date software.
  • In the medium term, we will write down the Keras API as the “Keras spec”, and we will set up a “Keras committee” to overview changes to the Keras spec. Indeed, Keras is no longer a library, but rather a spec with different available implementations. Changes to this spec need to be centralized (before being replicated across all implementations) and trusted to an authority that will carefully review all proposed changes. This also ensures that there will be few changes and that all changes will have a strong rationale.
  • New, bleeding-edge functionality should preferably go to Keras contrib.

TF integration

The Keras 2 API will become part of the TensorFlow repository, to serve as a high-level API for TensorFlow. Concretely:

  • We are bringing a TF-only, independent implementation of the Keras spec into TF, first in tf.contrib, later in tf.keras.
  • This implementation will increasingly be based off of core TF primitives (e.g. TF core layers and Keras layers will be the same objects), making code built using tf.keras deeply compatible with other TF functionality. You will be able to mix and match core TF and tf.keras functionality seamlessly (in effect, tf.keras is just a TF API, not a separate library). Likewise, you should be able to use Keras models with e.g. TF Experiments, allowing you to easily train a Keras model in a distributed setting or on CloudML, or do distributed hyperparameter search. By using tf.keras, you will benefit from the full power of TensorFlow.
  • This integration does not affect the repository fchollet/keras. It continues to be the “home” of Keras, and Theano support will continue indefinitely. We are not replacing what is already there, rather, we are simply adopting the Keras spec as a built-in high-level API for TF.
  • Additionally, Microsoft is building a CNTK backend for Keras. In general, you should expect support for more backends in the future, not less. The goal is to have the Keras spec serve as a cross-platform front-end layer for deep learning, allowing compatibility of codebases and saved models across different backend engines. The more implementations the merrier.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 119
  • Comments: 46 (36 by maintainers)

Most upvoted comments

Will Keras2 support PyTorch as backend, in the future?

No, there are no plans to support PyTorch. There is nothing to be gained in supporting every novelty framework that crops up every quarter. Our goal is to make deep learning accessible and useful to as many people as possible, and that goal is completely opposite to building up deep learning hipster cred.

@fchollet here is a list of the masking requests I can think of right now. I might add more later:

  • The Embedding layer should work for higher-order inputs. Imagine a sentence represented as characters, for instance, and you want to embed each of the characters and then run a character-level encoder over each of the words. Your input would be (batch_size, num_words, num_characters_per_word). Embedding doesn’t currently work correctly with this input. There are lots of similar situations where you have higher-order word or character input, and none of them work correctly without modifying Embedding.
  • TimeDistributed needs to pass the mask through to the layer that it wraps. Additionally, there should be several subclasses available for handling compute_mask in different ways. For example, imagine the sentence representation from above. If I want to TimeDistribute a CNN encoder, applying it to each of the words, the mask I want to compute is basically K.any on the mask for each timestep, so that the output mask tells me which whole words were masked. If I then want to take those word representations and pass them through a Highway layer, I need to TimeDistribute the Highway layer over the number of words, because the tensor is (batch_size, num_words, encoding_dim). In this case, I want TimeDistributed to just pass through the mask. In still other cases, I might want to pass the computation of compute_mask to the wrapped layer, and join them afterwards. It’s possible that you could capture all three of these use cases with just the last one, but it would probably take some complex logic to do so, in addition to modifying the behavior of compute_mask in wrapped layers (e.g., LSTM doesn’t currently return a mask at all in the return_sequences=False case, and it would need to return either a 0 or a 1 for this to work).
  • An equivalent to K.softmax that takes a mask as input is needed. Any time you want to compute a softmax over something that’s padded, you need this. The most obvious use case is attentions over word sequences, but there are others, too. You could solve this by adding another backend function, or just by adding a Softmax layer that handles masking (which will in the end also need another backend function, or just its own code that uses backend functions).
  • The Lambda layer should support masking, as @braingineer said above.
  • Backend functions need to handle masks. For example, computing an attention is often done with something like K.batch_dot. If you want to implement bidirectional attention flow, you need to compute a similarity matrix that then gets passed through a couple of different softmaxes. As I already said above, the softmax needs to treat a mask correctly, so the operation that you did to compute the similarity matrix needs to propagate a correct mask (or you have to create one huge function, which prohibits re-using the similarity matrix in several downstream layers). So, we need a K.batch_dot that propagates a mask. Similar to what I said for K.softmax, you could either do this with another backend function, or you just add a BatchedDot layer that handles the mask correctly. In general, it seems useful to have layers associated with most backend functions that do the correct masking computation (this may not be necessary for all of them, especially if the Lambda layer supports masking and passes through the mask by default).
  • All layers should document their masking behavior (expected input shape and output shape, etc.), just like they document their input/output behavior.
  • Some high-level documentation about masking would be really nice (e.g., an “About masking” page), specifying how masking works in Keras, what a mask’s dtype should be, how to get masks into your Model, and in what situations you might want to use a mask.
  • EDITED TO ADD: It’d be nice if you could consistently call K.int_shape() on masks. This is not the case in the theano backend.

We have solutions for a lot of these problems in our codebase that we can contribute, though it’s all based on Keras 1.*, and I’m not sure how much will change in Keras 2. Either way, I’m happy to help contribute to fixing these issues. I would really like to see Keras succeed in being great for NLP.

@fchollet it’s just a plea to please take masking very seriously when thinking about the Keras 2.0 spec. It’s crucial for complex NLP, and some pretty basic building blocks of NLP models in Keras don’t support masking correctly (e.g., the Embedding layer, and the TimeDistributed layer, as pointed out in PRs I’ve already linked to). Additionally, almost none of the backend operations deal with masks. This is fine in some cases, but if you want to compute a softmax with a mask, for instance, you have to write your own code. This makes doing attentions over padded word sequences hard, and probably most implementations of attention in Keras are wrong because of this - if you apply the mask after a softmax, as done in this re-implementation of a popular paper, it’s wrong, because your distribution wasn’t normalized correctly, and it’s not obvious that it’s wrong from looking at the code.

There’s also very little documentation about masking. It’s in the background and easy to forget about. But you can’t forget about it when doing NLP, or you’re doing it wrong. It really needs to be treated as a fundamental component to any static computation graph applied to NLP tasks. The difficulty here is why people choose DyNet over Keras for NLP. There’s a whole lot to like about Keras - it’d be nice if were also really good for NLP.

Any chance that masking can get first-class support in the Keras 2.0 spec? Building complex NLP models with Keras is difficult and bug-prone, because masking is not supported very well. We’ve had to write a lot of custom layers and override default Keras layers in order to get them to handle masking correctly.

make graph visualization in TensorBoard great again, please! This is a feature request I honestly don’t know how to solve myself. I find Keras to make the graph tab on tensorboard hard to read

Yes, we’ve submitted some:

https://github.com/fchollet/keras/pull/3218 https://github.com/fchollet/keras/pull/4253 https://github.com/fchollet/keras/pull/4258

But getting no response after trying to submit improvements is pretty demoralizing for submitting future PRs, so we started just overriding Keras layers in our code (e.g., here, a really trivial fix to Highway that makes it work with masking, that wasn’t included because masking is an afterthought in the current Keras API).

Will this PR freeze affect docstring improvements?

Also with the release of Keras 2 would it be a good idea to greatly reduce the number of tickets and implement a system/process that prevents or redirects general debugging questions to gitter, slack channel, or stackoverflow? From what I’ve seen most of the issues on this repo are implementation clarifications, general deep learning questions, and debugging help.

As for the keras spec when it is released, will there be a list of TODOs where the community can contribute? I’m very excited!

I have another general plea. If Keras 2 will become part of TF, can we please have a replication of the TF layers as keras 2 ones.

For instance, it has been months before any attention was given to #4457 (Deconvolution3D/Conv3DTranspose), albeit it being part of the layers supported by TF for a while (and used by anyone doing any 3D networks). Or somehow feature replicate any of the layers that are 1 and 2D (which by looking at the documentation is effectively the only such layer lacking).

@farizrahman4u Is Keras 2 ready now?

I am mainly looking forward to the fit_distributed() which could automatically use multiple GPUs promised by @fchollet months ago : )

Exciting! Do you need any help with the porting?

New, bleeding-edge functionality should preferably go to Keras contrib.

It would be nice to have a rough criteria and perhaps a few examples on what should go to contrib and what should go to Keras proper.

Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one?

Yes, it will be moved to Keras organization in the future. If any of the code breaks when Keras 2 is launched, it will be fixed by the maintainers. Else, each of the source files will be converted to the latest API passively.

Hi there. Couple of questions:

  1. @fchollet is there any guide describing the API changes, namely “what’s changed”, “what will be deprecated”, “what will stay unchanged”… and so on? If not, is there any plan to do so? I would be happy to help/contribute to the documentation about this - imho it would really help the transition.

  2. Is the keras-contrib repo mentioned and referenced in this conversation (by @farizrahman4u) the official one? Is there any plan to integrate it as a Keras Branch/module once Keras 2.0 will be released? I’m asking this also because I probably spotted a couple of cases in which Keras 1.X API have been used…

Cheers

Hi @fchollet, I’ve just written a prototype for Keras using a Deeplearning4j backend. After completing this experiment, I’ve learned a lot about the design of Keras and pluggability of the framework.

Since a rewrite is already on the table, I am wondering if there are plans to make the backend more modular? In other words, do you have plans for a backend to handle more of the actual execution and give more granular control?

For example, Deeplearning4j runs in the JVM and bridges with the ND4J binary. In some cases, it is more advantageous and performant for DL4J to directly handle most of what happens for a fit() or evaluate() operation. This is partly to avoid creating dual references in Python and the JVM (using py4j to bridge the two environments).

The idea is that Keras is a more advanced “string argument” generator that creates a standard for model config and instructing the backend on what to execute. The DL4J experiment has already done this at a core level, and I believe there are some performance gains to be made.

Exciting! This is really a big news. I hope I can mak contributions to Keras 2! BTW, here are some possible adjustment in Keras 2.

  • merge and Merge. It confuses users.
  • metrics, as I said in another issure, metrics are unnecessary to be a part of computation graph. Writting metrics with “tensors” is not a easy job.
  • validation_split and samples shuffle, the separation of training data and validation data should be conducted after the dataset get shuffled.
  • I also hope more details in training process are accessible in callbacks. It is very powerful.

Looking forward to the age of Keras 2!

Concretely, if I want to use the Tensorflow backend, is it better to import keras (“Keras is no longer a library, but rather a spec”), import tensorflow.contrib.keras (marked as deprecated but still in the docs) or import tensorflow.keras (not documented)? Confused 😕

In addition to the comments of @ParthaEth the same is true for reinforcement learning problems, loading images via tensorflow tensors #5356, and semantic segmentation seems to be second class. One example is keras-rl.

I don’t expect Keras to handle every possible design and problem, but I think it is important to point out areas of weakness before LTS API scoping decisions are settled so the appropriate choices can be made explicitly.

I would also like to draw attention to the fact that building custom RNN architecture is next to impossible in Keras without nasty hacks. Please have a look at this discussion for details. Because of this reason people are forced to create repositories like RecurrentShop. It would be nice to have some official attention on making life easier for RNN researchers.

I have personally felt that Keras leans more towards image stuff rather than nlp. I can’t pin point why exactly I “feel” so; limited support for masking is definitely one of the factors…