transformers: model.save() does not save keras model that includes DIstillBert layer
š Bug
Information
I am trying to build a Keras Sequential model, where, I use DistillBERT as a non-trainable embedding layer. The model complies and fits well, even predict method works. But when I want to save it using model.save(model.h5), It fails and shows the following error:
> ---------------------------------------------------------------------------
> NotImplementedError Traceback (most recent call last)
> <ipython-input-269-557c9cec7497> in <module>
> ----> 1 model.get_config()
>
> /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py in get_config(self)
> 966 if not self._is_graph_network:
> 967 raise NotImplementedError
> --> 968 return copy.deepcopy(get_network_config(self))
> 969
> 970 @classmethod
>
> /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py in get_network_config(network, serialize_layer_fn)
> 2117 filtered_inbound_nodes.append(node_data)
> 2118
> -> 2119 layer_config = serialize_layer_fn(layer)
> 2120 layer_config['name'] = layer.name
> 2121 layer_config['inbound_nodes'] = filtered_inbound_nodes
>
> /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py in serialize_keras_object(instance)
> 273 return serialize_keras_class_and_config(
> 274 name, {_LAYER_UNDEFINED_CONFIG_KEY: True})
> --> 275 raise e
> 276 serialization_config = {}
> 277 for key, item in config.items():
>
> /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/utils/generic_utils.py in serialize_keras_object(instance)
> 268 name = get_registered_name(instance.__class__)
> 269 try:
> --> 270 config = instance.get_config()
> 271 except NotImplementedError as e:
> 272 if _SKIP_FAILED_SERIALIZATION:
>
> /usr/local/lib/python3.7/site-packages/tensorflow/python/keras/engine/network.py in get_config(self)
> 965 def get_config(self):
> 966 if not self._is_graph_network:
> --> 967 raise NotImplementedError
> 968 return copy.deepcopy(get_network_config(self))
> 969
>
> NotImplementedError:
The language I am using the model in English.
The problem arises when using my own modified scripts: (give details below)
from transformers import DistilBertConfig, TFDistilBertModel, DistilBertTokenizer
max_len = 8
distil_bert = 'distilbert-base-uncased'
config = DistilBertConfig(dropout=0.2, attention_dropout=0.2)
config.output_hidden_states = False
transformer_model = TFDistilBertModel.from_pretrained(distil_bert, config = config)
input_word_ids = tf.keras.layers.Input(shape=(max_len,), dtype = tf.int32, name = "input_word_ids")
distill_output = transformer_model(input_word_ids)[0]
cls_out = tf.keras.layers.Lambda(lambda seq: seq[:, 0, :])(distill_output)
X = tf.keras.layers.BatchNormalization()(cls_out)
X = tf.keras.layers.Dense(256, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(128, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.BatchNormalization()(X)
X = tf.keras.layers.Dense(64, activation='relu')(X)
X = tf.keras.layers.Dropout(0.2)(X)
X = tf.keras.layers.Dense(2)(X)
model = tf.keras.Model(inputs=input_word_ids, outputs=X)
for layer in model.layers[:3]:
layer.trainable = False
The tasks I am working on is my own dataset.
To reproduce
Steps to reproduce the behavior:
- Run the above code
- You will get the error when saving the model as
model.save('model.h5')
You can get the same error if you try:
model.get_config()
An interesting observation: if you save the model without specifying ā.h5ā like
model.save('./model')
it saves the model as TensorFlow saved_model format and creates folders (assets (empty), variables, and some index files). But if you try to load the model, it produces different errors related to the DistillBert/Bert. It may be due to some naming inconsistency (input_ids vs. inputs, see below) inside the DistillBert model.
new_model = tf.keras.models.load_model('./model)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.7/site-packages/tensorflow/python/util/nest.py in assert_same_structure(nest1, nest2, check_types, expand_composites)
377 _pywrap_utils.AssertSameStructure(nest1, nest2, check_types,
--> 378 expand_composites)
379 except (ValueError, TypeError) as e:
ValueError: The two structures don't have the same nested structure.
First structure: type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}
Second structure: type=TensorSpec str=TensorSpec(shape=(None, 8), dtype=tf.int32, name='inputs')
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, 8), dtype=tf.int32, name='inputs')" is not
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-229-b46ed71fd9ad> in <module>
----> 1 new_model = tf.keras.models.load_model(keras_model_path)
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/save.py in load_model(filepath, custom_objects, compile)
188 if isinstance(filepath, six.string_types):
189 loader_impl.parse_saved_model(filepath)
--> 190 return saved_model_load.load(filepath, compile)
191
192 raise IOError(
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in load(path, compile)
114 # TODO(kathywu): Add saving/loading of optimizer, compiled losses and metrics.
115 # TODO(kathywu): Add code to load from objects that contain all endpoints
--> 116 model = tf_load.load_internal(path, loader_cls=KerasObjectLoader)
117
118 # pylint: disable=protected-access
/usr/local/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in load_internal(export_dir, tags, loader_cls)
602 loader = loader_cls(object_graph_proto,
603 saved_model_proto,
--> 604 export_dir)
605 root = loader.get(0)
606 if isinstance(loader, Loader):
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in __init__(self, *args, **kwargs)
186 self._models_to_reconstruct = []
187
--> 188 super(KerasObjectLoader, self).__init__(*args, **kwargs)
189
190 # Now that the node object has been fully loaded, and the checkpoint has
/usr/local/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in __init__(self, object_graph_proto, saved_model_proto, export_dir)
121 self._concrete_functions[name] = _WrapperFunction(concrete_function)
122
--> 123 self._load_all()
124 self._restore_checkpoint()
125
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _load_all(self)
213
214 # Finish setting up layers and models. See function docstring for more info.
--> 215 self._finalize_objects()
216
217 @property
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _finalize_objects(self)
504 layers_revived_from_saved_model.append(node)
505
--> 506 _finalize_saved_model_layers(layers_revived_from_saved_model)
507 _finalize_config_layers(layers_revived_from_config)
508
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in _finalize_saved_model_layers(layers)
675 call_fn = _get_keras_attr(layer).call_and_return_conditional_losses
676 if call_fn.input_signature is None:
--> 677 inputs = infer_inputs_from_restored_call_function(call_fn)
678 else:
679 inputs = call_fn.input_signature[0]
/usr/local/lib/python3.7/site-packages/tensorflow/python/keras/saving/saved_model/load.py in infer_inputs_from_restored_call_function(fn)
919 for concrete in fn.concrete_functions[1:]:
920 spec2 = concrete.structured_input_signature[0][0]
--> 921 spec = nest.map_structure(common_spec, spec, spec2)
922 return spec
923
/usr/local/lib/python3.7/site-packages/tensorflow/python/util/nest.py in map_structure(func, *structure, **kwargs)
609 for other in structure[1:]:
610 assert_same_structure(structure[0], other, check_types=check_types,
--> 611 expand_composites=expand_composites)
612
613 flat_structure = [flatten(s, expand_composites) for s in structure]
/usr/local/lib/python3.7/site-packages/tensorflow/python/util/nest.py in assert_same_structure(nest1, nest2, check_types, expand_composites)
383 "Entire first structure:\n%s\n"
384 "Entire second structure:\n%s"
--> 385 % (str(e), str1, str2))
386
387
ValueError: The two structures don't have the same nested structure.
First structure: type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}
Second structure: type=TensorSpec str=TensorSpec(shape=(None, 8), dtype=tf.int32, name='inputs')
More specifically: Substructure "type=dict str={'input_ids': TensorSpec(shape=(None, 5), dtype=tf.int32, name='input_ids')}" is a sequence, while substructure "type=TensorSpec str=TensorSpec(shape=(None, 8), dtype=tf.int32, name='inputs')" is not
Entire first structure:
{'input_ids': .}
Entire second structure:
.
Expected behavior
I expect to have a normal saving and loading of the model.
Environment info
transformers
version: 2.9.1- Platform:
- Python version: 3.7.6
- Tensorflow version (CPU): 2.2.0
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 13
- Comments: 21 (11 by maintainers)
I had this exact error. I got around it by saving the weights and the code that creates the model. After training your model, run
model.save_weights('path/savefile')
. Note there is no .h5 on it.When you want to reuse the model later, run your code until
model.compile()
. Then,model.load_weights('path/savefile')
.Same issue
This still occurs, not only with distilbert but also many others. I donāt see why this issue was closed - The described workaround is quite cumbersome and error-prone, and I donāt see why this cannot be implemented inside the library, given that the configuration should already be in place to allow overriding get_config / from_config methods?
The issue still occurs on TF 2.6.0 which is very disappointing. I tried training on Colabās TPU and on GPU.
model.save_weights
;model.load_weights
@skbaur It seems like one of the relevant PRs didnāt make it into the release, in that case - please use the master version for now, and hopefully once 4.13 is released you can just use that instead!
@skbaur Although that patch was reverted, we quickly followed up with a fixed one at https://github.com/huggingface/transformers/pull/14415 , so the issue should now be resolved. If youāre still encountering this issue after updating to the most recent version of Transformers, please let me know!
The patch has now been merged. Itāll be in the next release, or if anyone else is encountering this issue before then, you can install from master with
pip install git+https://github.com/huggingface/transformers.git