transformers: `return_loss=True` in call for `TFCLIPModel` bugs out.

System Info

transformers version: 4.23.1
Platform: Linux-5.10.133±x86_64-with-Ubuntu-18.04-bionic
Python version: 3.7.15
Huggingface_hub version: 0.10.1
PyTorch version (GPU?): 1.12.1+cu113 (False)
Tensorflow version (GPU?): 2.9.2 (False)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: No
Using distributed or parallel set-up in script?: No

Who can help?

@patil-suraj

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

To reproduce the bug I have used the following code snippet 👇

import tensorflow as tf
from PIL import Image
import requests
from transformers import CLIPProcessor, TFCLIPModel

model = TFCLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

inputs = processor(
    text=["a photo of a cat", "a photo of a dog"], images=image, return_tensors="tf", padding=True
)

outputs = model(
    input_ids=inputs["input_ids"],
    pixel_values=inputs["pixel_values"],
    attention_mask=inputs["attention_mask"],
    return_loss=True,
    return_dict=True,
)

Expected behavior

The call should execute and we should obtain the outputs.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 15 (9 by maintainers)

Most upvoted comments

It looks like the problem in this issue is that you are not passing along as many images as texts. Passing images=[image, image] makes your reproducer pass.

sgugger on Nov 21, 2022