transformers: Cannnot train Roberta: 2 different errors

Environment info

  • transformers version: 4.4.0.dev0
  • Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
  • Python version: 3.6.9
  • PyTorch version (GPU?): 1.7.0+cu101 (True)
  • Tensorflow version (GPU?): 2.4.1 (True)
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No (Single GPU) --> COLAB

Who can help

I am not sure, since the error is very vague and untraceable


Model I am using (Bert, XLNet …): Roberta

The problem arises when using:

  • the official example scripts: (give details below)

The tasks I am working on is:

  • my own task or dataset: (give details below)

It is a private dataset, so I am not at liberty to share it. However, I can provide a clue as to how the csv looks like:-

,ID,Text,Label …

I do not think there can be anything wrong with the DataFrame as I am taking data from specific columns and converting them to numpy arrays for the rest of the steps in the HF “Fine-tuning” guide.

To reproduce

Steps to reproduce the behavior:

!git clone
!cd transformers
!pip install -e .

train_text = list(train['Text'].values)
train_label = list(train['Label'].values)

val_text = list(val['Text'].values)
val_label = list(val['Label'].values)

from transformers import RobertaTokenizer, TFRobertaForSequenceClassification
import tensorflow as tf

tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')

train_encodings = tokenizer(train_text, truncation=True, padding=True)
val_encodings = tokenizer(val_text, truncation=True, padding=True)

train_dataset =
val_dataset =

All this code is common. Howver, now there is a difference in errors depending upon the training method.

###Training using trainer


from transformers import TFTrainingArguments, TFTrainer

training_args = TFTrainingArguments(
    output_dir='./results',          # output directory
    num_train_epochs=3,              # total number of training epochs
    per_device_train_batch_size=16,  # batch size per device during training
    per_device_eval_batch_size=64,   # batch size for evaluation
    warmup_steps=500,                # number of warmup steps for learning rate scheduler
    weight_decay=0.01,               # strength of weight decay
    logging_dir='./logs',            # directory for storing logs

with training_args.strategy.scope():
    model = TFRobertaForSequenceClassification.from_pretrained("roberta-base")

trainer = TFTrainer(
    model=model,                         # the instantiated Transformers model to be trained
    args=training_args,                  # training arguments, defined above
    train_dataset=train_dataset,         # training dataset
    eval_dataset=val_dataset             # evaluation dataset



All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


TypeError                                 Traceback (most recent call last)

<ipython-input-52-f86f69d7497b> in <module>()
     22 )
---> 24 trainer.train()

10 frames

/usr/local/lib/python3.6/dist-packages/transformers/ in train(self)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in __call__(self, *args, **kwds)
    826     tracing_count = self.experimental_get_tracing_count()
    827     with trace.Trace(self._name) as tm:
--> 828       result = self._call(*args, **kwds)
    829       compiler = "xla" if self._experimental_compile else "nonXla"
    830       new_tracing_count = self.experimental_get_tracing_count()

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in _call(self, *args, **kwds)
    869       # This is the first call of __call__, so we have to initialize.
    870       initializers = []
--> 871       self._initialize(args, kwds, add_initializers_to=initializers)
    872     finally:
    873       # At this point we know that the initialization is complete (or less

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in _initialize(self, args, kwds, add_initializers_to)
    724     self._concrete_stateful_fn = (
    725         self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
--> 726             *args, **kwds))
    728     def invalid_creator_scope(*unused_args, **unused_kwds):

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
   2967       args, kwargs = None, None
   2968     with self._lock:
-> 2969       graph_function, _ = self._maybe_define_function(args, kwargs)
   2970     return graph_function

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in _maybe_define_function(self, args, kwargs)
   3360           self._function_cache.missed.add(call_context_key)
-> 3361           graph_function = self._create_graph_function(args, kwargs)
   3362           self._function_cache.primary[cache_key] = graph_function

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3204             arg_names=arg_names,
   3205             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3206             capture_by_value=self._capture_by_value),
   3207         self._function_attributes,
   3208         function_spec=self.function_spec,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    988         _, original_func = tf_decorator.unwrap(python_func)
--> 990       func_outputs = python_func(*func_args, **func_kwargs)
    992       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in wrapped_fn(*args, **kwds)
    632             xla_context.Exit()
    633         else:
--> 634           out = weak_wrapped_fn().__wrapped__(*args, **kwds)
    635         return out

/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/ in bound_method_wrapper(*args, **kwargs)
   3885     # However, the replacer is still responsible for attaching self properly.
   3886     # TODO(mdan): Is it possible to do it here instead?
-> 3887     return wrapped_fn(*args, **kwargs)
   3888   weak_bound_method_wrapper = weakref.ref(bound_method_wrapper)

/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ in wrapper(*args, **kwargs)
    975           except Exception as e:  # pylint:disable=broad-except
    976             if hasattr(e, "ag_error_metadata"):
--> 977               raise e.ag_error_metadata.to_exception(e)
    978             else:
    979               raise

TypeError: in user code:

    /usr/local/lib/python3.6/dist-packages/transformers/ distributed_training_steps  *
        nb_instances_in_batch = self._compute_nb_instances(batch)
    /usr/local/lib/python3.6/dist-packages/transformers/ _compute_nb_instances  *
        nb_instances = tf.reduce_sum(tf.cast(labels != -100, dtype=tf.int32))
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/util/ wrapper
        return target(*args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/ tensor_not_equals
        return gen_math_ops.not_equal(self, other, incompatible_shape_error=False)
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/ not_equal
    /usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/ _apply_op_helper
        repr(values), type(values).__name__, err))

    TypeError: Expected string passed to parameter 'y' of op 'NotEqual', got -100 of type 'int' instead. Error: Expected string, got -100 of type 'int' instead.

###Using Native Tensorflow code (from official example) CODE:

from transformers import TFRobertaForSequenceClassification

model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')

optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn, validation_data=val.shuffle(1000).batch(16), epochs=3, batch_size=16)


All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.

Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


AttributeError                            Traceback (most recent call last)

<ipython-input-51-a13d177c752e> in <module>()
      5 optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
      6 model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
----> 7, validation_data=val.shuffle(1000).batch(16), epochs=3, batch_size=16)

/usr/local/lib/python3.6/dist-packages/pandas/core/ in __getattr__(self, name)
   5139             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5140                 return self[name]
-> 5141             return object.__getattribute__(self, name)
   5143     def __setattr__(self, name: str, value) -> None:

AttributeError: 'DataFrame' object has no attribute 'shuffle'

This is very surprising since the error are pretty different and I can’t find many fixes online. I tested the datatypes of the input data and it seems to check out.

Expected behavior

The model to start training on this SequenceClassification task and achieve good accuracy on it.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (4 by maintainers)

Most upvoted comments

I am talking about using Trainer(). I can’t use it - the cell executes successfully but it never starts training

Please follow the instructions in the template and do not tag more than three people. In this case you are sending notifications to seven different persons for a problem no one can help you solve since you did not give enough information. Let’s see why:

The first error seems to indicate your labels are strings, which cannot be known for sure since you did not provide an example of what your data look like. Just saying “My data is private so I can’t share it with you” is not helpful. You could give us the first line of the dataset, potentially masking some private content.

If your labels indeed are strings, you need to convert them to some IDs (going from 0 to your number of labels) before trying to train your model with them. You model should also be instantiated with the correct number of labels by passing along num_labels=xxx (otherwise you will get other errors down the line).

The second error has nothing to do with transformers, you are passing val.shuffle as validation data where val is a pandas DataFrame and therefore as no shuffle method.