transformers: Cannnot train Roberta: 2 different errors
Environment info
transformers
version: 4.4.0.dev0- Platform: Linux-4.19.112±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.6.9
- PyTorch version (GPU?): 1.7.0+cu101 (True)
- Tensorflow version (GPU?): 2.4.1 (True)
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No (Single GPU) --> COLAB
Who can help
I am not sure, since the error is very vague and untraceable
Information
Model I am using (Bert, XLNet …): Roberta
The problem arises when using:
- the official example scripts: (give details below)
The tasks I am working on is:
- my own task or dataset: (give details below)
It is a private dataset, so I am not at liberty to share it. However, I can provide a clue as to how the csv
looks like:-
,ID,Text,Label …
I do not think there can be anything wrong with the DataFrame as I am taking data from specific columns and converting them to numpy arrays for the rest of the steps in the HF “Fine-tuning” guide.
To reproduce
Steps to reproduce the behavior:
!git clone https://github.com/huggingface/transformers.git
!cd transformers
!pip install -e .
train_text = list(train['Text'].values)
train_label = list(train['Label'].values)
val_text = list(val['Text'].values)
val_label = list(val['Label'].values)
from transformers import RobertaTokenizer, TFRobertaForSequenceClassification
import tensorflow as tf
tokenizer = RobertaTokenizer.from_pretrained('roberta-base')
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')
train_encodings = tokenizer(train_text, truncation=True, padding=True)
val_encodings = tokenizer(val_text, truncation=True, padding=True)
train_dataset = tf.data.Dataset.from_tensor_slices((
dict(train_encodings),
train_label
))
val_dataset = tf.data.Dataset.from_tensor_slices((
dict(val_encodings),
val_label
))
All this code is common. Howver, now there is a difference in errors depending upon the training method.
###Training using trainer
Code:
from transformers import TFTrainingArguments, TFTrainer
training_args = TFTrainingArguments(
output_dir='./results', # output directory
num_train_epochs=3, # total number of training epochs
per_device_train_batch_size=16, # batch size per device during training
per_device_eval_batch_size=64, # batch size for evaluation
warmup_steps=500, # number of warmup steps for learning rate scheduler
weight_decay=0.01, # strength of weight decay
logging_dir='./logs', # directory for storing logs
logging_steps=10,
)
with training_args.strategy.scope():
model = TFRobertaForSequenceClassification.from_pretrained("roberta-base")
trainer = TFTrainer(
model=model, # the instantiated Transformers model to be trained
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
trainer.train()
ERROR:-
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.
Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-52-f86f69d7497b> in <module>()
22 )
23
---> 24 trainer.train()
10 frames
/usr/local/lib/python3.6/dist-packages/transformers/trainer_tf.py in train(self)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
826 tracing_count = self.experimental_get_tracing_count()
827 with trace.Trace(self._name) as tm:
--> 828 result = self._call(*args, **kwds)
829 compiler = "xla" if self._experimental_compile else "nonXla"
830 new_tracing_count = self.experimental_get_tracing_count()
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
869 # This is the first call of __call__, so we have to initialize.
870 initializers = []
--> 871 self._initialize(args, kwds, add_initializers_to=initializers)
872 finally:
873 # At this point we know that the initialization is complete (or less
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in _initialize(self, args, kwds, add_initializers_to)
724 self._concrete_stateful_fn = (
725 self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
--> 726 *args, **kwds))
727
728 def invalid_creator_scope(*unused_args, **unused_kwds):
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _get_concrete_function_internal_garbage_collected(self, *args, **kwargs)
2967 args, kwargs = None, None
2968 with self._lock:
-> 2969 graph_function, _ = self._maybe_define_function(args, kwargs)
2970 return graph_function
2971
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
3359
3360 self._function_cache.missed.add(call_context_key)
-> 3361 graph_function = self._create_graph_function(args, kwargs)
3362 self._function_cache.primary[cache_key] = graph_function
3363
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
3204 arg_names=arg_names,
3205 override_flat_arg_shapes=override_flat_arg_shapes,
-> 3206 capture_by_value=self._capture_by_value),
3207 self._function_attributes,
3208 function_spec=self.function_spec,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
988 _, original_func = tf_decorator.unwrap(python_func)
989
--> 990 func_outputs = python_func(*func_args, **func_kwargs)
991
992 # invariant: `func_outputs` contains only Tensors, CompositeTensors,
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
632 xla_context.Exit()
633 else:
--> 634 out = weak_wrapped_fn().__wrapped__(*args, **kwds)
635 return out
636
/usr/local/lib/python3.6/dist-packages/tensorflow/python/eager/function.py in bound_method_wrapper(*args, **kwargs)
3885 # However, the replacer is still responsible for attaching self properly.
3886 # TODO(mdan): Is it possible to do it here instead?
-> 3887 return wrapped_fn(*args, **kwargs)
3888 weak_bound_method_wrapper = weakref.ref(bound_method_wrapper)
3889
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/func_graph.py in wrapper(*args, **kwargs)
975 except Exception as e: # pylint:disable=broad-except
976 if hasattr(e, "ag_error_metadata"):
--> 977 raise e.ag_error_metadata.to_exception(e)
978 else:
979 raise
TypeError: in user code:
/usr/local/lib/python3.6/dist-packages/transformers/trainer_tf.py:669 distributed_training_steps *
nb_instances_in_batch = self._compute_nb_instances(batch)
/usr/local/lib/python3.6/dist-packages/transformers/trainer_tf.py:681 _compute_nb_instances *
nb_instances = tf.reduce_sum(tf.cast(labels != -100, dtype=tf.int32))
/usr/local/lib/python3.6/dist-packages/tensorflow/python/util/dispatch.py:201 wrapper
return target(*args, **kwargs)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_ops.py:1786 tensor_not_equals
return gen_math_ops.not_equal(self, other, incompatible_shape_error=False)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/gen_math_ops.py:6412 not_equal
name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow/python/framework/op_def_library.py:531 _apply_op_helper
repr(values), type(values).__name__, err))
TypeError: Expected string passed to parameter 'y' of op 'NotEqual', got -100 of type 'int' instead. Error: Expected string, got -100 of type 'int' instead.
###Using Native Tensorflow
code (from official example)
CODE:
from transformers import TFRobertaForSequenceClassification
model = TFRobertaForSequenceClassification.from_pretrained('roberta-base')
optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
model.fit(train_dataset.shuffle(1000).batch(16), validation_data=val.shuffle(1000).batch(16), epochs=3, batch_size=16)
ERROR:-
All model checkpoint layers were used when initializing TFRobertaForSequenceClassification.
Some layers of TFRobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-51-a13d177c752e> in <module>()
5 optimizer = tf.keras.optimizers.Adam(learning_rate=5e-5)
6 model.compile(optimizer=optimizer, loss=model.compute_loss) # can also use any keras loss fn
----> 7 model.fit(train_dataset.shuffle(1000).batch(16), validation_data=val.shuffle(1000).batch(16), epochs=3, batch_size=16)
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5139 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5140 return self[name]
-> 5141 return object.__getattribute__(self, name)
5142
5143 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'shuffle'
This is very surprising since the error are pretty different and I can’t find many fixes online. I tested the datatypes of the input data and it seems to check out.
Expected behavior
The model to start training on this SequenceClassification
task and achieve good accuracy on it.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (4 by maintainers)
I am talking about using
Trainer()
. I can’t use it - the cell executes successfully but it never starts trainingPlease follow the instructions in the template and do not tag more than three people. In this case you are sending notifications to seven different persons for a problem no one can help you solve since you did not give enough information. Let’s see why:
The first error seems to indicate your labels are strings, which cannot be known for sure since you did not provide an example of what your data look like. Just saying “My data is private so I can’t share it with you” is not helpful. You could give us the first line of the dataset, potentially masking some private content.
If your labels indeed are strings, you need to convert them to some IDs (going from 0 to your number of labels) before trying to train your model with them. You model should also be instantiated with the correct number of labels by passing along
num_labels=xxx
(otherwise you will get other errors down the line).The second error has nothing to do with transformers, you are passing
val.shuffle
as validation data whereval
is a pandas DataFrame and therefore as noshuffle
method.