tensorflow: TF 2.3 broken hierarchical functional model loading (e.g. HAN) [ValueError: Unknown layer: Functional]

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): yes OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux Ubuntu 16.04 TensorFlow installed from (source or binary): binary TensorFlow version (use command below): v2.3 Python version: 3.6.9 CUDA/cuDNN version: v10.2 GPU model and memory: GeForce GTX 1070 - 8117MiB

Describe the current behavior: I cannot load a model trained with TF 2.3 in TF 2.2 -> breaking change In TF 2.3 the release notes mention the following: Functional models now get constructed if any tensor in a layer call's arguments/keyword arguments comes from a keras input. Previously the functional api would only work if all of the elements in the first argument to the layer came from a keras input.

I have a hierarchical attention model, trained either with TF2.2 with the following underlying config:

{'name': 'HAN_DocSent', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, 150, None), 'dtype': 'int32', 'sparse': False, 'ragged': False, 'name': 'input_1'}, 'name': 'input_1', 'inbound_nodes': []}, {'class_name': 'TimeDistributed', 'config': {'name': 'time_distributed', 'trainable': True, 'dtype': 'float32', 'layer': {'class_name': 'Model', 'config': {'name': 'HAN_SentWord', 'layers': [{'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, None), 'dtype': 'int32', 'sparse': False, 'ragged': False, 'name': 'word/sentence_input'}, 'name': 'word/sentence_input', 'inbound_nodes': []}, {'class_name': 'Embedding', 'config': {'name': 'word_embedding', 'trainable': True, 'batch_input_shape': (None, None), 'dtype': 'float32', 'input_dim': 20002, 'output_dim': 300, 'embeddings_initializer': {'class_name': 'RandomUniform', 'config': {'minval': -0.05, 'maxval': 0.05, 'seed': None}}, 'embeddings_regularizer': None, 'activity_regularizer': None, 'embeddings_constraint': None, 'mask_zero': True, 'input_length': None}, 'name': 'word_embedding', 'inbound_nodes': [[['word/sentence_input', 0, 0, {}]]]}, {'class_name': 'Bidirectional', 'config': {'name': 'bidirectional', 'trainable': True, 'dtype': 'float32', 'layer': {'class_name': 'GRU', 'config': {'name': 'gru', 'trainable': True, 'dtype': 'float32', 'return_sequences': True, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 100, 'activation': 'tanh', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'recurrent_initializer': {'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.25, 'recurrent_dropout': 0.25, 'implementation': 1, 'reset_after': True}}, 'merge_mode': 'concat'}, 'name': 'bidirectional', 'inbound_nodes': [[['word_embedding', 0, 0, {}]]]}, {'class_name': 'HierarchicalAttention', 'config': {'name': 'word_attention', 'trainable': True, 'dtype': 'float32'}, 'name': 'word_attention', 'inbound_nodes': [[['bidirectional', 0, 0, {}]]]}], 'input_layers': [['word/sentence_input', 0, 0]], 'output_layers': [['word_attention', 0, 0]]}}}, 'name': 'time_distributed', 'inbound_nodes': [[['input_1', 0, 0, {}]]]}, {'class_name': 'Bidirectional', 'config': {'name': 'bidirectional_1', 'trainable': True, 'dtype': 'float32', 'layer': {'class_name': 'GRU', 'config': {'name': 'gru_1', 'trainable': True, 'dtype': 'float32', 'return_sequences': True, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 100, 'activation': 'tanh', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'recurrent_initializer': {'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.25, 'recurrent_dropout': 0.25, 'implementation': 1, 'reset_after': True}}, 'merge_mode': 'concat'}, 'name': 'bidirectional_1', 'inbound_nodes': [[['time_distributed', 0, 0, {}]]]}, {'class_name': 'HierarchicalAttention', 'config': {'name': 'sentence_attention', 'trainable': True, 'dtype': 'float32'}, 'name': 'sentence_attention', 'inbound_nodes': [[['bidirectional_1', 0, 0, {}]]]}, {'class_name': 'Dense', 'config': {'name': 'dense_2', 'trainable': True, 'dtype': 'float32', 'units': 31, 'activation': 'softmax', 'use_bias': True, 'kernel_initializer': {'class_name': 'GlorotUniform', 'config': {'seed': None}}, 'bias_initializer': {'class_name': 'Zeros', 'config': {}}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'name': 'dense_2', 'inbound_nodes': [[['sentence_attention', 0, 0, {}]]]}], 'input_layers': [['input_1', 0, 0]], 'output_layers': [['dense_2', 0, 0]]}

Yet in TF 2.3 the saved config is differently wrapped with “Functional”:

{"class_name": "Functional", "config": {"name": "HAN_DocSent", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [null, 30, null], "dtype": "int32", "sparse": false, "ragged": false, "name": "input_1"}, "name": "input_1", "inbound_nodes": []}, {"class_name": "TimeDistributed", "config": {"name": "time_distributed", "trainable": true, "dtype": "float32", "layer": {"class_name": "Functional", "config": {"name": "HAN_SentWord", "layers": [{"class_name": "InputLayer", "config": {"batch_input_shape": [null, null], "dtype": "int32", "sparse": false, "ragged": false, "name": "word/sentence_input"}, "name": "word/sentence_input", "inbound_nodes": []}, {"class_name": "Embedding", "config": {"name": "word_embedding", "trainable": true, "batch_input_shape": [null, null], "dtype": "float32", "input_dim": 20000, "output_dim": 50, "embeddings_initializer": {"class_name": "RandomUniform", "config": {"minval": -0.05, "maxval": 0.05, "seed": null}}, "embeddings_regularizer": null, "activity_regularizer": null, "embeddings_constraint": null, "mask_zero": true, "input_length": null}, "name": "word_embedding", "inbound_nodes": [[["word/sentence_input", 0, 0, {}]]]}, {"class_name": "Bidirectional", "config": {"name": "bidirectional", "trainable": true, "dtype": "float32", "layer": {"class_name": "GRU", "config": {"name": "gru", "trainable": true, "dtype": "float32", "return_sequences": true, "return_state": false, "go_backwards": false, "stateful": false, "unroll": false, "time_major": false, "units": 100, "activation": "tanh", "recurrent_activation": "sigmoid", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "recurrent_initializer": {"class_name": "Orthogonal", "config": {"gain": 1.0, "seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "recurrent_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "recurrent_constraint": null, "bias_constraint": null, "dropout": 0.5, "recurrent_dropout": 0.25, "implementation": 1, "reset_after": true}}, "merge_mode": "concat"}, "name": "bidirectional", "inbound_nodes": [[["word_embedding", 0, 0, {}]]]}, {"class_name": "HierarchicalAttention", "config": {"name": "word_attention", "trainable": true, "dtype": "float32"}, "name": "word_attention", "inbound_nodes": [[["bidirectional", 0, 0, {}]]]}], "input_layers": [["word/sentence_input", 0, 0]], "output_layers": [["word_attention", 0, 0]]}}}, "name": "time_distributed", "inbound_nodes": [[["input_1", 0, 0, {}]]]}, {"class_name": "Bidirectional", "config": {"name": "bidirectional_1", "trainable": true, "dtype": "float32", "layer": {"class_name": "GRU", "config": {"name": "gru_1", "trainable": true, "dtype": "float32", "return_sequences": true, "return_state": false, "go_backwards": false, "stateful": false, "unroll": false, "time_major": false, "units": 100, "activation": "tanh", "recurrent_activation": "sigmoid", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "recurrent_initializer": {"class_name": "Orthogonal", "config": {"gain": 1.0, "seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "recurrent_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "recurrent_constraint": null, "bias_constraint": null, "dropout": 0.5, "recurrent_dropout": 0.25, "implementation": 1, "reset_after": true}}, "merge_mode": "concat"}, "name": "bidirectional_1", "inbound_nodes": [[["time_distributed", 0, 0, {}]]]}, {"class_name": "HierarchicalAttention", "config": {"name": "sentence_attention", "trainable": true, "dtype": "float32"}, "name": "sentence_attention", "inbound_nodes": [[["bidirectional_1", 0, 0, {}]]]}, {"class_name": "Dense", "config": {"name": "dense_2", "trainable": true, "dtype": "float32", "units": 5, "activation": "softmax", "use_bias": true, "kernel_initializer": {"class_name": "GlorotUniform", "config": {"seed": null}}, "bias_initializer": {"class_name": "Zeros", "config": {}}, "kernel_regularizer": null, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "name": "dense_2", "inbound_nodes": [[["sentence_attention", 0, 0, {}]]]}], "input_layers": [["input_1", 0, 0]], "output_layers": [["dense_2", 0, 0]]}}'

The error, of course, relates to the internal model which now compiles to a “Functional” model: ValueError: Unknown layer: Functional The errors occurs in this line:

-> model = model_config_lib.model_from_config(model_config, ...)

Describe the expected behavior: Model loading stays stable between releases…

Standalone code to reproduce the issue I cannot share the code due to proprietary rights… yet since it is reported in the release, it should be straightforward to find a solution. E.g. should I change the way I load my model?

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (7 by maintainers)

Most upvoted comments

I do not think make the training TF version same with the prediction TF version is a good idea. I met this issue because i have deployed the prediction model to our customer site with TF version 2.2, but out training system recently upgrade TF version to 2.4. Therefore, our new trained model cannot be used on our clients’ site. Upgrade TF libs of all our clients are infeasible, nor keep our GPU training system to TF 2.2 is a good idea i suppose.

Could we add an args to model.save() to export model in old format?

yichenj on Mar 18, 2021