transformers: AttributeError: 'BertForPreTraining' object has no attribute 'shape'

Is there any suggestion for fixing the following? I was trying “convert_tf_checkpoint_to_pytorch.py” to convert a model trained from scratch but the conversion didn’t work out…

Skipping cls/seq_relationship/output_weights/adam_v
Traceback (most recent call last):
  File "pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py", line 66, in <module>
    args.pytorch_dump_path)
  File "pytorch_pretrained_bert/convert_tf_checkpoint_to_pytorch.py", line 37, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_bert(model, tf_checkpoint_path)
  File "/content/my_pytorch-pretrained-BERT/pytorch_pretrained_bert/modeling.py", line 117, in load_tf_weights_in_bert
    assert pointer.shape == array.shape
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 535, in __getattr__
    type(self).__name__, name))
AttributeError: 'BertForPreTraining' object has no attribute 'shape'

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 16 (2 by maintainers)

Commits related to this issue

# ebudur: added some additional variables to skip according to the suggestion in the following issue # https://github.com/huggingface/transformers/issues/393#issuecomment-513446685 — committed to e-budur/transformers by e-budur 4 years ago
# added the variable "batch_finite" as well # according to the suggestion in the following issue # https://github.com/huggingface/transformers/issues/393#issuecomment-513446685 — committed to e-budur/transformers by e-budur 4 years ago
# added the variable "accum" as well # according to the suggestion in the following issue # https://github.com/huggingface/transformers/issues/393#issuecomment-513446685 — committed to e-budur/transformers by e-budur 4 years ago
# added the variable "local_step" as well # according to the suggestion in the following issue # https://github.com/huggingface/transformers/issues/393#issuecomment-513446685 — committed to e-budur/transformers by e-budur 4 years ago
Add support for Grouped Query Attention on Llama Model (#393) Resolves #388 — committed to birkskyum/transformers by felladrin 8 months ago

Most upvoted comments

I’m getting a similar error when trying to convert the newer BERT models released at tensorflow/models/tree/master/official/nlp/.

These models are either BERT models trained with Keras or else checkpoints converted from the original google-research/bert repository. I also get the same error when I convert the TF1 to TF2 checkpoints myself using the tf2_encoder_checkpoint_converter.py script:

What I have tried:

First, I have downloaded a model:

wget https://storage.googleapis.com/cloud-tpu-checkpoints/bert/keras_bert/cased_L-12_H-768_A-12.tar.gz
# or
wget https://storage.googleapis.com/cloud-tpu-checkpoints/bert/tf_20/cased_L-12_H-768_A-12.tar.gz

After unpacking:

export BERT_BASE_DIR=cased_L-12_H-768_A-12

transformers-cli convert --model_type bert \
    --tf_checkpoint $BERT_BASE_DIR/bert_model.ckpt \
    --config $BERT_BASE_DIR/bert_config.json \
    --pytorch_dump_output $BERT_BASE_DIR/pytorch_model.bin

The command prints the configuration but throws the following error:

INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_attention_layer/_value_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE with shape [768, 12, 64]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_attention_layer_norm/beta/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_attention_layer_norm/gamma/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_attention_output_dense/bias/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_attention_output_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE with shape [12, 64, 768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_intermediate_dense/bias/.ATTRIBUTES/VARIABLE_VALUE with shape [3072]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_intermediate_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE with shape [768, 3072]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_output_dense/bias/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_output_dense/kernel/.ATTRIBUTES/VARIABLE_VALUE with shape [3072, 768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_output_layer_norm/beta/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight model/layer_with_weights-9/_output_layer_norm/gamma/.ATTRIBUTES/VARIABLE_VALUE with shape [768]
INFO:transformers.modeling_bert:Loading TF weight save_counter/.ATTRIBUTES/VARIABLE_VALUE with shape []
INFO:transformers.modeling_bert:Skipping _CHECKPOINTABLE_OBJECT_GRAPH
Traceback (most recent call last):
  File "/home/jbarry/anaconda3/envs/transformers/bin/transformers-cli", line 30, in <module>
    service.run()
  File "/home/jbarry/anaconda3/envs/transformers/lib/python3.6/site-packages/transformers/commands/convert.py", line 62, in run
    convert_tf_checkpoint_to_pytorch(self._tf_checkpoint, self._config, self._pytorch_dump_output)
  File "/home/jbarry/anaconda3/envs/transformers/lib/python3.6/site-packages/transformers/convert_bert_original_tf_checkpoint_to_pytorch.py", line 36, in convert_tf_checkpoint_to_pytorch
    load_tf_weights_in_bert(model, config, tf_checkpoint_path)
  File "/home/jbarry/anaconda3/envs/transformers/lib/python3.6/site-packages/transformers/modeling_bert.py", line 118, in load_tf_weights_in_bert
    assert pointer.shape == array.shape
  File "/home/jbarry/anaconda3/envs/transformers/lib/python3.6/site-packages/torch/nn/modules/module.py", line 585, in __getattr__
    type(self).__name__, name))
AttributeError: 'BertForPreTraining' object has no attribute 'shape'

This is happening in a fresh environment with PyTorch 1.3 installed in Anaconda (Linux), as well as pip-installing tf-nightly and transformers (2.3.0).

Has anyone else been able to successfully convert the TF 2.0 version models to PyTorch or know where I’m going wrong? Thanks!

jbrry on Jan 14, 2020

@thomwolf If the above fix will be added to the master branch this will be great https://github.com/smartshark/transformers/pull/1

lironT74 on Jun 30, 2021

I managed to get it working by going through the pointers in debug mode and checking what variable name corresponded to what. This is the function I ended up using.

def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path):
    config_path = os.path.abspath(bert_config_file)
    tf_path = os.path.abspath(tf_checkpoint_path)
    print("Converting TensorFlow checkpoint from {} with config at {}".format(tf_path, config_path))
    # Load weights from TF model
    init_vars = tf.train.list_variables(tf_path)
    excluded = ["BERTAdam", "_power", "global_step", "_CHECKPOINTABLE_OBJECT_GRAPH"]
    init_vars = list(filter(lambda x: all([True if e not in x[0] else False for e in excluded]), init_vars))
    names = []
    arrays = []
    for name, shape in init_vars:
        print("Loading TF weight {} with shape {}".format(name, shape))
        array = tf.train.load_variable(tf_path, name)
        names.append(name)
        arrays.append(array)

    config = BertConfig.from_json_file(bert_config_file)
    print("Building PyTorch model from configuration: {}".format(str(config)))
    # Initialise PyTorch model
    model = BertForSequenceClassification(config)

    for name, array in zip(names, arrays):
        name = name.split("/")
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if any(n in ["adam_v", "adam_m", "global_step", "bad_steps", "global_step", "good_steps", "loss_scale",
                     "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "save_counter", ".OPTIMIZER_SLOT"] for n in name) or \
                name[0] == "optimizer":
            print("Skipping {}".format("/".join(name)))
            continue
        if ".OPTIMIZER_SLOT" in name:
            idx = name.index(".OPTIMIZER_SLOT")
            name = name[:idx]
        elif ".ATTRIBUTES" in name:
            idx = name.index(".ATTRIBUTES")
            name = name[:idx]
        print(name)
        pointer = model
        for m_name in name:
            if re.fullmatch(r"[A-Za-z]+_\d+", m_name):
                scope_names = re.split(r"_(\d+)", m_name)
            else:
                scope_names = [m_name]
            if scope_names[0] == "kernel" or scope_names[0] == "gamma":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "output_bias" or scope_names[0] == "beta":
                pointer = getattr(pointer, "bias")
            elif scope_names[0] == "output_weights":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "squad":
                pointer = getattr(pointer, "classifier")
            elif scope_names[0] == "dense_output" or scope_names[0] == "bert_output":
                pointer = getattr(pointer, "output")
            elif scope_names[0] == "self_attention":
                pointer = getattr(pointer, "self")
            else:
                try:
                    pointer = getattr(pointer, scope_names[0])
                except AttributeError:
                    logger.info("Skipping {}".format("/".join(name)))
                    continue
            if len(scope_names) >= 2:
                num = int(scope_names[1])
                pointer = pointer[num]
        if m_name[-11:] == "_embeddings":
            pointer = getattr(pointer, "weight")
        elif m_name == "kernel" or m_name == "gamma" or m_name == "output_weights":
            array = np.transpose(array)
        # print("Initialize PyTorch weight {}".format(name))
        pointer.data = torch.from_numpy(array)

    # Save pytorch-model
    print("Save PyTorch model to {}".format(pytorch_dump_path))
    torch.save(model.state_dict(), pytorch_dump_path)


convert_tf_checkpoint_to_pytorch(tf_path, config_path, pytorch_dump_path)

lrizzello on Jul 24, 2020

I revised modeling_bert.by following @lrizzello 's code and could save tf1 checkpoint I personally trained into pytorch. I first changed tf1 checkpoint to tf2, and then used the below code. Here is the code I revised in modeling_bert.py

def load_tf_weights_in_bert(model, config, tf_checkpoint_path):
    try:
        import re
        import numpy as np
        import tensorflow as tf
    except ImportError:
        logger.error(
            "Loading a TensorFlow model in PyTorch, requires TensorFlow to be installed. Please see "
            "https://www.tensorflow.org/install/ for installation instructions."
        )
        raise
    tf_path = os.path.abspath(tf_checkpoint_path)
    logger.info(f"Converting TensorFlow checkpoint from {tf_path}")
    # Load weights from TF model
    init_vars = tf.train.list_variables(tf_path)
    names = []
    arrays = []
    for name, shape in init_vars:
        logger.info(f"Loading TF weight {name} with shape {shape}")
        array = tf.train.load_variable(tf_path, name)
        names.append(name)
        arrays.append(array)
    for name, array in zip(names, arrays):
        name = name.split("/")
        # adam_v and adam_m are variables used in AdamWeightDecayOptimizer to calculated m and v
        # which are not required for using pretrained model
        if any(
            ["adam_v", "adam_m", "global_step", "bad_steps", "global_step", "good_steps", "loss_scale",
                     "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "save_counter", ".OPTIMIZER_SLOT"] for n in name) or \
                name[0] == "optimizer":
        #     n in ["adam_v", "adam_m", "AdamWeightDecayOptimizer", "AdamWeightDecayOptimizer_1", "global_step"]
        #     for n in name
        # ):
            logger.info(f"Skipping {'/'.join(name)}")
            continue
        if ".OPTIMIZER_SLOT" in name:
            idx = name.index(".OPTIMIZER_SLOT")
            name = name[:idx]
        elif ".ATTRIBUTES" in name:
            idx = name.index(".ATTRIBUTES")
            name = name[:idx]
        print(name)
        pointer = model
        for m_name in name:
            if re.fullmatch(r"[A-Za-z]+_\d+", m_name):
                scope_names = re.split(r"_(\d+)", m_name)
            else:
                scope_names = [m_name]
            if scope_names[0] == "kernel" or scope_names[0] == "gamma":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "output_bias" or scope_names[0] == "beta":
                pointer = getattr(pointer, "bias")
            elif scope_names[0] == "output_weights":
                pointer = getattr(pointer, "weight")
            elif scope_names[0] == "squad":
                pointer = getattr(pointer, "classifier")
            elif scope_names[0] == "dense_output" or scope_names[0] == "bert_output":
                pointer = getattr(pointer, "output")
            elif scope_names[0] == "self_attention":
                pointer = getattr(pointer, "self")
            else:
                try:
                    pointer = getattr(pointer, scope_names[0])
                except AttributeError:
                    logger.info("Skipping {}".format("/".join(name)))
                    continue
            if len(scope_names) >= 2:
                num = int(scope_names[1])
                pointer = pointer[num]
        if m_name[-11:] == "_embeddings":
            pointer = getattr(pointer, "weight")
        elif m_name == "kernel" or m_name == "gamma" or m_name == "output_weights":
            array = np.transpose(array)
        # try:
        #     if pointer.shape != array.shape:
        #         raise ValueError(f"Pointer shape {pointer.shape} and array shape {array.shape} mismatched")
        # except AssertionError as e:
        #     e.args += (pointer.shape, array.shape)
        #     raise
        logger.info(f"Initialize PyTorch weight {name}")
        pointer.data = torch.from_numpy(array)
    return model

For convert_tf_to_pytorch function, I used below.

import argparse
import os
import torch

from transformers import BertConfig, BertForPreTraining, load_tf_weights_in_bert
from transformers.utils import logging

logging.set_verbosity_info()
def convert_tf_checkpoint_to_pytorch(tf_checkpoint_path, bert_config_file, pytorch_dump_path):
    # Initialise PyTorch model
    config = BertConfig.from_json_file(bert_config_file)
    print(f"Building PyTorch model from configuration: {config}")
    model = BertForPreTraining(config)
    # Load weights from tf checkpoint
    load_tf_weights_in_bert(model, config, tf_checkpoint_path)
    # Save pytorch-model
    os.makedirs(pytorch_dump_path)
    pytorch_dump_path = os.path.join(pytorch_dump_path, '0')
    print(f"Save PyTorch model to {pytorch_dump_path}")
    torch.save(model.state_dict(), pytorch_dump_path)
if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    # Required parameters
    parser.add_argument(
        "--tf_checkpoint_path", default=None, type=str, required=True, help="Path to the TensorFlow checkpoint path."
    )
    parser.add_argument(
        "--bert_config_file",
        default=None,
        type=str,
        required=True,
        help="The config json file corresponding to the pre-trained BERT model. \n"
        "This specifies the model architecture.",
    )
    parser.add_argument(
        "--pytorch_dump_path", default=None, type=str, required=True, help="Path to the output PyTorch model."
    )
    args = parser.parse_args()
    convert_tf_checkpoint_to_pytorch(args.tf_checkpoint_path, args.bert_config_file, args.pytorch_dump_path)

Hope this help!

Jwmc999 on Mar 25, 2022

Hi, this is an actual programming error in modeling_bert.py. If you look at line 145 it’s pretty obvious that the code should be continuing to the next iteration of the outer loop (over name, array) rather than the inner one (over the path components of name) - otherwise why would the error messages say “skipping {name}”:

https://github.com/huggingface/transformers/blob/master/src/transformers/models/bert/modeling_bert.py#L145

To fix this, simply extract the try/except block so that it wraps the entire loop (lines 127-148). I would supply a patch but I have to work with transformers 3.5.1 for the moment since I’m using sentence-transformers which hasn’t been updated to the latest version.

dhdaines on Dec 1, 2020

I solve this problem by bypass some variables in the model, such as “bad_steps”, “global_step", “good_steps”, “loss_scale”. They don’t have attribute 'shape‘ and I don’t need them when fineturning the model.

In modeling.py, line 121, replace it with
if any(n in [“adam_v”, “adam_m”, “global_step”, “bad_steps”, “global_step”, “good_steps”, “loss_scale”] for n in name): and delete line 151-156.

NotToday on Jul 20, 2019