transformers: `contrastive-image-text/run_clip.py` example problems

System Info

  • transformers version: 4.37.0.dev0
  • Platform: Linux-5.15.0-88-generic-x86_64-with-glibc2.31
  • Python version: 3.11.5
  • Huggingface_hub version: 0.20.1
  • Safetensors version: 0.4.1
  • Accelerate version: 0.25.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.1.2+cu121 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@amyeroberts

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

The following example script has some issues: https://github.com/huggingface/transformers/blob/main/examples/pytorch/contrastive-image-text/run_clip.py

Minor issue:

When using --train_file dataset.csv, the tokenizer fails if the caption is “None”, “null” or “NA”

Curiosity:

  • There seems to be no parameter to specify the hub repository to push to.
  • Also, there seems to be no place to track the experiment (like wandb)

Actual issue

With the following parameters

    --model_name_or_path "openai/clip-vit-base-patch32" \
    --freeze_text_model \
    --train_file "train.csv" \
    --image_column "image_path" \
    --caption_column "caption" \
    --remove_unused_columns=False \
    --do_train \
    --per_device_train_batch_size="64" \
    --per_device_eval_batch_size="64" \
    --learning_rate="5e-5" --warmup_steps="0" --weight_decay 0.1 \
    --overwrite_output_dir \
    --push_to_hub

I get the following error:

[INFO|trainer.py:1712] 2023-12-30 18:16:36,697 >> ***** Running training *****
[INFO|trainer.py:1713] 2023-12-30 18:16:36,697 >>   Num examples = 348,784
[INFO|trainer.py:1714] 2023-12-30 18:16:36,697 >>   Num Epochs = 3
[INFO|trainer.py:1715] 2023-12-30 18:16:36,698 >>   Instantaneous batch size per device = 64
[INFO|trainer.py:1718] 2023-12-30 18:16:36,698 >>   Total train batch size (w. parallel, distributed & accumulation) = 64
[INFO|trainer.py:1719] 2023-12-30 18:16:36,698 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1720] 2023-12-30 18:16:36,698 >>   Total optimization steps = 16,350
[INFO|trainer.py:1721] 2023-12-30 18:16:36,698 >>   Number of trainable parameters = 88,111,361
  0%|                                                                                                                                                                                                    | 0/16350 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 590, in <module>
    main()
  File "/home/amoryo/sign-language/signwriting-clip/signwriting_clip/transformers/examples/pytorch/contrastive-image-text/run_clip.py", line 559, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1534, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 1860, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2737, in training_step
    loss = self.compute_loss(model, inputs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/trainer.py", line 2760, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 1108, in forward
    text_outputs = self.text_model(
                   ^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 691, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/amoryo/conda/envs/clip/lib/python3.11/site-packages/transformers/models/clip/modeling_clip.py", line 219, in forward
    embeddings = inputs_embeds + position_embeddings
                 ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
RuntimeError: The size of tensor a (128) must match the size of tensor b (77) at non-singleton dimension 1

Expected behavior

Example script should train, and push to hub correctly

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 18

Most upvoted comments

I guess that’s everything. Thanks so much! Feel free to close once #28482 is one

I still find the training loss periodicity puzzling, but i have no idea. it also happens with a different base model image

yes, i am specifying the absolute path