transformers: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
System Info
transformersversion: 4.30.2- Platform: Linux-5.15.120±x86_64-with-glibc2.31
- Python version: 3.10.12
- Huggingface_hub version: 0.16.4
- Safetensors version: 0.3.1
- PyTorch version (GPU?): 2.0.0+cpu (False)
- Tensorflow version (GPU?): 2.12.0 (False)
- Flax version (CPU?/GPU?/TPU?): 0.7.0 (cpu)
- Jax version: 0.4.13
- JaxLib version: 0.4.13
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>
Who can help?
@ArthurZucker and @younesbelkada
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I’m trying to make a Sarcasm detector with Lightning in this Kaggle notebook.
When I start the training, I get this error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
This is my LightningModule:
class SarcasmTagger(pl.LightningModule):
def __init__(
self,
model_name: str,
n_classes: int,
n_training_steps=None,
n_warmup_steps=None
):
super().__init__()
self.bert = BertModel.from_pretrained(model_name, return_dict=True)
#self.bert = BertForSequenceClassification.from_pretrained(model_name, return_dict=True)
self.classifier = nn.Linear(self.bert.config.hidden_size, n_classes)
self.n_training_steps = n_training_steps
self.n_warmup_steps = n_warmup_steps
def forward(self, input_ids, attention_mask):
outputs = self.bert(input_ids=input_ids, attention_mask=attention_mask)
#print(outputs)
logits = self.classifier(outputs.pooler_output)
return logits
def shared_step(self, batch, batch_idx):
input_ids = batch["input_ids"]
attention_mask = batch["attention_mask"]
label = batch["label"].view(-1, 1)
logits = self(input_ids=input_ids, attention_mask=attention_mask)
loss = nn.functional.cross_entropy(logits, label)
return logits, loss, label
def training_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("train_loss", loss, prog_bar=True, logger=True)
return {"loss": loss, "predictions": logits, "label": label}
def validation_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("val_loss", loss, prog_bar=True, logger=True)
return loss
def test_step(self, batch, batch_idx):
logits, loss, label = self.shared_step(batch, batch_idx)
self.log("test_loss", loss, prog_bar=True, logger=True)
return loss
def configure_optimizers(self):
optimizer = AdamW(self.parameters(), lr=2e-5)
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps=self.n_warmup_steps,
num_training_steps=self.n_training_steps
)
return dict(
optimizer=optimizer,
lr_scheduler=dict(
scheduler=scheduler,
interval='step')
)
What is the problem here? I’m lost.
Thanks!
Expected behavior
Execute the training without errors.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (2 by maintainers)
some more details:
These combinations work:
torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.26.1torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.27.4torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.28.1torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.29.2These combinations don’t:
torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.30.0torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.30.2torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.31.0So the regression must have been introduced in
transformers==4.30.0?I’ll try to see if I can get a minimal reproducing script together.
Same problem here, as suggested, it was resolved with the switch of optimizers
Hi all, the default has been changed on main now and will populate on the next release. Install with
pip install git+https://github.com/huggingface/transformersto use it OOTB!If our AdamW is not working properly, all the more reasons to switch the default to the PyTorch one. Users will still be able to switch back if they do not like the change.
Not entirely sure this is worth looking into too much, given @stas00 point here: https://github.com/huggingface/transformers/pull/23417#issuecomment-1550506298
So yes, AdamW is slated for deprecation and you should use
torch.optim.AdamW. @sgugger do we know when that is going to be? Or should we look into this more.There wasn’t anything explicit in the change to AdamW since v0.29.0, so it’ll take some digging to find the exact commit certainly.
some more details after I swapped this line of code:
with this line:
now all the versions of transformers I tested earlier work on my existing codebase:
torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.26.1torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.27.4torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.28.1torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.29.2torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.30.0torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.30.2torch==2.0.0+cu117,pytorch-lightning==1.9.4,accelerate==0.21.0,tokenizers==0.13.3,transformers==4.31.0therefore, there is pretty strong evidence that something in
transformers.AdamWintransformers==4.30.0caused a regression?thanks a lot @lcoandrade for that! 🙌 I can now upgrade our transformers dependency to the latest!
I have a similar issue.
With
pytorch-lightning==1.9.4andtransformers==4.26.1the code runs fine (and has done with previous versions of both libraries for months/years - yes there have been code changes in that time but the core has been rather stable).(Also just tested with
transformers==4.29.2and works fine)However, when I change nothing in the code and change no other dependencies (so
pytorch-lightning==1.9.4and all others the same) except to upgrade totransformers==4.30.2the code fails with the error message:The problem is that my codebase is very large and it will take me a while to generate a minimal reproducing script. I will try to put this together, but in the time it takes me to do this, perhaps someone else will have a simpler solution (considering the information I am sharing) and/or a simpler minimal reproducing script.
Perhaps also @lcoandrade you could try your script with
transformers==4.26.1ortransformers==4.29.2and see if that works for you?