transformers: `return_loss=True` in call for `TFCLIPModel` bugs out.
System Info
transformersversion: 4.23.1- Platform: Linux-5.10.133±x86_64-with-Ubuntu-18.04-bionic
- Python version: 3.7.15
- Huggingface_hub version: 0.10.1
- PyTorch version (GPU?): 1.12.1+cu113 (False)
- Tensorflow version (GPU?): 2.9.2 (False)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: No
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
To reproduce the bug I have used the following code snippet 👇
import tensorflow as tf
from PIL import Image
import requests
from transformers import CLIPProcessor, TFCLIPModel
model = TFCLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(
text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="tf", padding=True
)
outputs = model(
input_ids=inputs["input_ids"],
pixel_values=inputs["pixel_values"],
attention_mask=inputs["attention_mask"],
return_loss=True,
return_dict=True,
)
Expected behavior
The call should execute and we should obtain the outputs.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (9 by maintainers)
It looks like the problem in this issue is that you are not passing along as many images as texts. Passing
images=[image, image]makes your reproducer pass.