transformers: ModuleAttributeError occurs during Converting TensorFlow Checkpoints (BERT)

Environment info

  • transformers version: 4.1.1
  • Platform: Linux-4.15.0-129-generic-x86_64-with-glibc2.10
  • Python version: 3.8.3
  • PyTorch version (GPU?): 1.7.0 (True)
  • Tensorflow version (GPU?): 2.3.1 (True)
  • Using GPU in script?: <fill in>
  • Using distributed or parallel set-up in script?: <fill in>

Who can help

albert, bert, GPT2, XLM: @LysandreJik

Information

Model I am using (Bert, XLNet …): Bert

The problem arises when using:

  • the official example scripts: (give details below)
  • my own modified scripts: (give details below)

The tasks I am working on is:

  • Convert TF v1 ckpt to PyTorch

To reproduce

I tried to convert a TensorFlow checkpoint, but ModuleAttributeError occurred.

What I run:

****@**** $ transformers-cli convert --model_type bert \
>   --tf_checkpoint $MODEL_DIR/model.ckpt \
>   --config ****/bert_config.json \
>   --pytorch_dump_output $MODEL_DIR/pytorch_model.bin

(In this time, bert_config.json is in a separate folder, but it corresponds to the ckpt.)

Output is:

Traceback (most recent call last):
  File "/****/.pyenv/versions/anaconda3-2020.07/bin/transformers-cli", line 8, in <module>
    sys.exit(main())
  File "/****/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/transformers/commands/transformers_cli.py", line 51, in main
    service.run()
  File "/****/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/transformers/commands/convert.py", line 105, in run
    convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
  File "/****/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/transformers/models/bert/convert_bert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_bert(model, config, tf_checkpoint_path)
  File "/****/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 155, in load_tf_weights_in_bert
    pointer.shape == array.shape
  File "/****/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/torch/nn/modules/module.py", line 778, in __getattr__
    raise ModuleAttributeError("'{}' object has no attribute '{}'".format(
torch.nn.modules.module.ModuleAttributeError: 'BertEmbeddings' object has no attribute 'shape'

Expected behavior

I think it is not strange that BertEmbeddings (nn.Module) doesn’t have shape.

Is it possible to get such an error depending on the original TensorFlow checkpoint?  In such a case, is there any tips to deal with it?

I really appreciate any help you can provide.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21 (20 by maintainers)

Most upvoted comments

Fantastic! Great job, thank you for sharing your progress!

Hmmm indeed it seems that the model doesn’t fit one-to-one to our architecture. You might need to slightly tweak the architecture and conversion script to load it, but you’re probably the most expert on the matter. If you want me to take a deeper look, feel free to send me the weights/config so I can take a look locally.

Hmmm I understand.

I don’t think it’s the global_step, as this gets skipped here:

https://github.com/huggingface/transformers/blob/b020a736c374460af1b34267283f957988350630/src/transformers/models/bert/modeling_bert.py#L120-L125

As a way to debug what’s happening here, could you add the following log statement:

logger.info(f"Trying to assign {name}")

right after the following line: https://github.com/huggingface/transformers/blob/b020a736c374460af1b34267283f957988350630/src/transformers/models/bert/modeling_bert.py#L116

It would then look like:

    for name, array in zip(names, arrays):
        logger.info(f"Trying to assign {name}")
        name = name.split("/")
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if any(
            n in ["adam_v", "adam_m", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "global_step"]
            for n in name
        ):

we can then try to identify what’s happening with the checkpoint.