simpletransformers: Problems when classifying after finetuning BERT (Multi-Label)
I am following the write-up to a muti-label classification as done here https://towardsdatascience.com/multi-label-classification-using-bert-roberta-xlnet-xlm-and-distilbert-with-simple-transformers-b3e0cda12ce5
I am having some difficulties. I loaded a Dutch base BERT model (from here https://github.com/wietsedv/bertje) and then I train a multi-label model with 50 labels:
import pandas as pd
from sklearn.model_selection import train_test_split
df = pd.read_csv("all_data_withid.csv", encoding="utf8", delimiter=";")
df['labels'] = list(zip(df.label1.tolist(), df.label2.tolist(), ...)) #truncated for brevity
train_df, eval_df = train_test_split(df, test_size=0.3, random_state=123456)
model = MultiLabelClassificationModel('bert', 'bert-base-dutch-cased/bertje-base', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False})
result, model_outputs, wrong_predictions = model.eval_model(validation_df)
Now the end result is that I get an LRAP score of roughly 0.71. However, now I am a bit puzzled on how to use this model to classify a single new instance. I closed Python, opened it again and loaded my trained model from disk:
model = MultiLabelClassificationModel('bert', 'outputs', num_labels=50, args={'train_batch_size':2, 'gradient_accumulation_steps':16, 'learning_rate': 3e-5, 'num_train_epochs': 1, 'max_seq_length': 512, 'fp16': False}).
I then tried model.predict(["dit is een test"]) and model.predict(["en nog een compleet andere test"])
and as it turns out the resulting outputs and predictions (always all 0s for every class) for these 2 distinct sentences are exactly the same on all values. I also tried to evaluate (result, model_outputs, wrong_predictions = model.eval_model(validation_df)) 3 times on different splits of my dataset but in all scenarios the resulting LRAP is the same ~0.71.
What am I doing wrong here?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 67 (23 by maintainers)
Lowering the learning rate and/or the number of training epochs seems to be the best solution to prevent the model from breaking completely and predicting the same class.
Same problem here, accuracy of 98% but in prediction only getting 0 for all labels. Tried Albert, Roberta, Bert, distilbert
Edit: Problem solved after completely reinstalling and rebooting
That is the general practice. Weight decay is not applied to normalization layers and bias weights. My understanding is that it is unnecessary as those don’t usually overfit.
Greetings, I think I solved it - it is the learning rate.
First of all as @ThilinaRajapakse and @venkatasg pointed out about considering the inputs of the classifier as inappropriate is irreverent. I got different results when I changed this, probably by chance.
The learning rate that is applied by defaults is 4e-5 which is OK for fine-tuning transformer-based models but not that good for the classification layer. What I did and got nice predictions was to change the learning rate of the classification layer to 1e-3 while keeping 4e-5 for the transformer-based model. So far I have only tried with Bert models.
So I modified the train function of ClassificationModel from simpletransformers.classification.classification_model from this:
to this (needs to add the new argument of learning_rate_classifier to global_args):
You don’t need to handle this manually anymore. Check docs here.
@Lysimachos @ThilinaRajapakse can you please tell me where to add this to simpletransformers code ? I’m doing multi-label classification and I think I’m facing a similar issue, but I don’t know where to add this code to make it work. Thanks!
I recommend utilizing a recent model from ACL 2020 instead of a classifier https://github.com/dmis-lab/BioSyn
Got it!
Seems to change the behavior indeed. Note for others: I did have to remove the cache dir.