transformers: `contrastive-image-text/run_clip.py` example problems

System Info

transformers version: 4.37.0.dev0
Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.31
Python version: 3.11.5
Huggingface_hub version: 0.20.1
Safetensors version: 0.4.1
Accelerate version: 0.25.0
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

The following example script has some issues: https://github.com/huggingface/transformers/blob/main/examples/pytorch/contrastive-image-text/run_clip.py

Minor issue:

When using --train_file dataset.csv, the tokenizer fails if the caption is “None”, “null” or “NA”

Curiosity:

There seems to be no parameter to specify the hub repository to push to.
Also, there seems to be no place to track the experiment (like wandb)

Actual issue

With the following parameters

    --model_name_or_path "openai/clip-vit-base-patch32" \
    --freeze_text_model \
    --train_file "train.csv" \
    --image_column "image_path" \
    --caption_column "caption" \
    --remove_unused_columns=False \
    --do_train \
    --per_device_train_batch_size="64" \
    --per_device_eval_batch_size="64" \
    --learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
    --overwrite_output_dir \
    --push_to_hub

I get the following error:

[INFO|trainer.py:1712] 2023-12-30 18:16:36,697 >> ***** Running training *****
[INFO|trainer.py:1713] 2023-12-30 18:16:36,697 >>   Num examples = 348,784
[INFO|trainer.py:1714] 2023-12-30 18:16:36,697 >>   Num Epochs = 3
[INFO|trainer.py:1715] 2023-12-30 18:16:36,698 >>   Instantaneous batch size per device = 64
[INFO|trainer.py:1718] 2023-12-30 18:16:36,698 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:1719] 2023-12-30 18:16:36,698 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1720] 2023-12-30 18:16:36,698 >>   Total optimization steps = 16,350
[INFO|trainer.py:1721] 2023-12-30 18:16:36,698 >>   Number of trainable parameters = 88,111,361
  0%|                                                                                                                                                                                                    | 0/16350 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 590, in <module>
    main()
  File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 559, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1534, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2737, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2760, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 1108, in forward
    text_outputs = self.text_model(
                   ^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 691, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 219, in forward
    embeddings = inputs_embeds + position_embeddings
                 ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (128) must match the size of tensor b (77) at non-singleton dimension 1

Expected behavior

Example script should train, and push to hub correctly

About this issue

Original URL
State: closed
Created 6 months ago
Comments: 18

Most upvoted comments

I guess that’s everything. Thanks so much! Feel free to close once #28482 is one

I still find the training loss periodicity puzzling, but i have no idea. it also happens with a different base model

AmitMY on Jan 12, 2024

yes, i am specifying the absolute path

AmitMY on Jan 12, 2024