vit-pytorch: Model doesn't converge
We are trying to apply this method on a medical dataset, and have about 70K images (224 res) for 5 classes. However, our training doesn’t converge (we tried a range of learning rates e.g. 3e-3, 3e-4 etc.) however doesn’t seem to work. Currently our model outputs 45% accuracy where the average accuracy for this dataset is around 85-90% (we trained for 100 epochs). Is there anything else we should tune?
Also, here is our configuration:
batch_size = 64
epochs = 400
lr = 3e-4
gamma = 0.7
seed = 42
efficient_transformer = Linformer(
dim=128,
seq_len=49 + 1, # 7x7 patches + 1 cls-token
depth=4,
heads=8,
k=64
)
# Visual Transformer
model = ViT(
dim=128,
image_size=224,
patch_size=32,
num_classes=5,
transformer=efficient_transformer, # nn.Transformer(d_model=128, nhead=8),
channels=1,
).to(device)
Thank you very much!
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 2
- Comments: 22 (7 by maintainers)
@liberbey Hey Ahmet! One of pitfalls of transformers is having settings that result in the dimension per head to be too small. The dimension per head should be at least 32 and best at 64. It can be calculated as
dim // heads
, so in your case, the dimension of each head is16
. Try increasing the dimension to 256 and increasing the sequence length (decrease patch size to 16) I would be very surprised if it does not workDid you use a special learning rate scheduler? My loss curve on my own dataset also shows an uncommon curve, check here. Seems that ViT is hard to train.
@lucidrains Thanks again! We will try to find a larger dataset. By the way, these are validation results, not test results. So we wondered if there could be another problem about our approach. Because we were expecting that the test results would be bad due to not using pretrained model but not the validation set… Also, do you have any suggestions by the dramatic drop around 80th epoch?
@lucidrains We have changed the parameters as:
But our model still does not converge. Here are the results:
Do you have any other suggestions? Thanks again!