transformers: `contrastive-image-text/run_clip.py` example problems
System Info
transformers
version: 4.37.0.dev0- Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.31
- Python version: 3.11.5
- Huggingface_hub version: 0.20.1
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
The following example script has some issues: https://github.com/huggingface/transformers/blob/main/examples/pytorch/contrastive-image-text/run_clip.py
Minor issue:
When using --train_file dataset.csv
, the tokenizer fails if the caption is “None”, “null” or “NA”
Curiosity:
- There seems to be no parameter to specify the hub repository to push to.
- Also, there seems to be no place to track the experiment (like wandb)
Actual issue
With the following parameters
--model_name_or_path "openai/clip-vit-base-patch32" \
--freeze_text_model \
--train_file "train.csv" \
--image_column "image_path" \
--caption_column "caption" \
--remove_unused_columns=False \
--do_train \
--per_device_train_batch_size="64" \
--per_device_eval_batch_size="64" \
--learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
--overwrite_output_dir \
--push_to_hub
I get the following error:
[INFO|trainer.py:1712] 2023-12-30 18:16:36,697 >> ***** Running training *****
[INFO|trainer.py:1713] 2023-12-30 18:16:36,697 >> Num examples = 348,784
[INFO|trainer.py:1714] 2023-12-30 18:16:36,697 >> Num Epochs = 3
[INFO|trainer.py:1715] 2023-12-30 18:16:36,698 >> Instantaneous batch size per device = 64
[INFO|trainer.py:1718] 2023-12-30 18:16:36,698 >> Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:1719] 2023-12-30 18:16:36,698 >> Gradient Accumulation steps = 1
[INFO|trainer.py:1720] 2023-12-30 18:16:36,698 >> Total optimization steps = 16,350
[INFO|trainer.py:1721] 2023-12-30 18:16:36,698 >> Number of trainable parameters = 88,111,361
0%| | 0/16350 [00:00<?, ?it/s]Traceback (most recent call last):
File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 590, in <module>
main()
File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 559, in main
train_result = trainer.train(resume_from_checkpoint=checkpoint)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1534, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2737, in training_step
loss = self.compute_loss(model, inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2760, in compute_loss
outputs = model(**inputs)
^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 1108, in forward
text_outputs = self.text_model(
^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 691, in forward
hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 219, in forward
embeddings = inputs_embeds + position_embeddings
~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (128) must match the size of tensor b (77) at non-singleton dimension 1
Expected behavior
Example script should train, and push to hub correctly
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 18
I guess that’s everything. Thanks so much! Feel free to close once #28482 is one
I still find the training loss periodicity puzzling, but i have no idea. it also happens with a different base model
yes, i am specifying the absolute path