open_flamingo: Imagenet evaluation error

When running

python open_flamingo/eval/evaluate.py \
--lm_path llama/7B-hf \
--lm_tokenizer_path llama/7B-hf \
--checkpoint_path OpenFlamingo-9B/checkpoint.pt \
--imagenet_root datasets/imagenet \
--eval_imagenet \
--cross_attn_every_n_layers 2 \
--device 1

I get the following error:

Using pad_token, but it is not set yet.
Loading checkpoint shards: 100%|
Flamingo model initialized with 2425735200 trainable parameters
Evaluating on ImageNet...
processing batch 0 of 625
  0%|                                                                                                                                                                                                                                          | 0/1000 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "openflamingo/open_flamingo/eval/evaluate.py", line 961, in <module>
    main()
  File "openflamingo/open_flamingo/eval/evaluate.py", line 295, in main
    imagenet_score = evaluate_imagenet(
  File "openflamingo/open_flamingo/eval/evaluate.py", line 917, in evaluate_imagenet
    per_sample_probs = compute_per_sample_probs(
  File "openflamingo/open_flamingo/eval/classification.py", line 76, in compute_per_sample_probs
    shift_logits, shift_labels = compute_shifted_logits_and_labels(
  File "openflamingo/open_flamingo/eval/classification.py", line 57, in compute_shifted_logits_and_labels
    end_of_prefix = -labels[idx].tolist()[::-1].index(tokenizer.eos_token_id) - 1
ValueError: 2 is not in list

Any ideas how to fix it?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19

Commits related to this issue

Most upvoted comments

Till now I still cannot find where causing the crash on ImageNet evaluation.

Instead, I can provide a simple ImageNet evaluation script which is written by myself but is more intuitive than implementation in evaluate.py:

from open_flamingo import create_model_and_transforms
from tqdm import tqdm
from open_flamingo.eval.imagenet_utils import (
    openai_imagenet_classnames,
    IMAGENET_1K_CLASS_ID_TO_LABEL,
)
import os
import numpy as np
import torch
import random
from PIL import Image

class ImageNetDataset(ImageFolder):
    """Class to represent the ImageNet1k dataset."""

    def __init__(self, root, **kwargs):
        super().__init__(root=root, **kwargs)

    def __getitem__(self, idx):
        sample, target = super().__getitem__(idx)
        target_label = IMAGENET_1K_CLASS_ID_TO_LABEL[target]
        return {
            "image": sample,
            "class_id": target,  # numeric ID of the ImageNet class
            "class_name": target_label,  # human-readable name of ImageNet class
        }

def find_sub_list(sl,l):
    results=[]
    sll=len(sl)
    for ind in (i for i,e in enumerate(l) if e==sl[0]):
        if l[ind:ind+sll]==sl:
            results.append(ind+sll-1)
    return results

model, image_processor, tokenizer = create_model_and_transforms(
    clip_vision_encoder_path="ViT-L-14",
    clip_vision_encoder_pretrained="openai",
    lang_encoder_path="/path/to/hf_llama_7b",
    tokenizer_path="/path/to/hf_llama_7b",
    cross_attn_every_n_layers=4
)

checkpoint_path = '/path/to/checkpoint.pt'
model.load_state_dict(torch.load(checkpoint_path), strict=False)
model.cuda()

train_dataset = ImageNetDataset('/path/to/imagenet/train')
val_dataset = ImageNetDataset('/path/to/imagenet/val')

context_num = 4

# random context samples
random_indices = np.random.choice(
        len(train_dataset), context_num, replace=False
    )
in_context_samples = [train_dataset[i] for i in random_indices]

acc1 = 0
acc5 = 0
count = 0
for i, batch in enumerate(val_dataset):

    vision_x = [image_processor(data['image']).unsqueeze(0) for data in in_context_samples] + [image_processor(batch['image']).unsqueeze(0)]
    vision_x = torch.cat(vision_x, dim=0)
    vision_x = vision_x.unsqueeze(1).unsqueeze(0)

    overall_probs = []
    for imagenet_class_name in openai_imagenet_classnames:

        tokenizer.padding_side = "left" # For generation padding tokens should be on the left
        lang_x = tokenizer(
            ["<image>A photo of a {}<|endofchunk|><image>A photo of a {}<|endofchunk|><image>A photo of a {}<|endofchunk|><image>A photo of a {}<|endofchunk|><image>A photo of a {}".format(in_context_samples[0]['class_name'], in_context_samples[1]['class_name'], in_context_samples[2]['class_name'], in_context_samples[3]['class_name'], imagenet_class_name)],
            return_tensors="pt",
        )

        outputs = model(
            vision_x=vision_x.cuda(),
            lang_x=lang_x["input_ids"].cuda(),
            attention_mask=lang_x["attention_mask"].cuda(),
            clear_conditioned_layers=False
        )
        probs = torch.softmax(outputs.logits, dim=-1).detach()
        # collect the probability of the generated token -- probability at index 0 corresponds to the token at index 1
        probs = probs[:, :-1, :]
        input_ids = lang_x["input_ids"][:, 1:].cuda()
        gen_probs = torch.gather(probs, 2, input_ids[:, :, None]).squeeze(-1)

        probs = []
        for input_sentence, input_probs in zip(input_ids, gen_probs):
            idxes = find_sub_list([32001, 319, 15373, 310, 263], input_sentence.detach().cpu().numpy().tolist())
            input_sentence = input_sentence[idxes[-1]+1:]
            input_probs = input_probs[idxes[-1]+1:]
            probs.append(torch.prod(input_probs).item())
        overall_probs.append(probs)

    count += 1
    top5 = [IMAGENET_1K_CLASS_ID_TO_LABEL[pred] for pred in np.argsort(np.array(overall_probs)[:,0])[::-1][:5]]
    if batch['class_name'] == top5[0]:
        acc1 += 1
    if batch['class_name'] in top5:
        acc5 += 1
    print('eval {}/{}: acc@1 ({}), acc@5 ({})'.format(i, len(val_dataset), acc1/count, acc5/count))

Wish this code can help someone who want to evaluate flamingo on ImageNet! I’m not sure that the script is completely correct. If there are any problem, feel free to comment in this issue. Note that context text input is hard coded. Remember to modify it when you change context_num.

When setting the tokens in tokenizer_config.json manually, it runs 😃

{"bos_token": "<s>", "eos_token": "</s>", "model_max_length": 1000000000000000019884624838656, "tokenizer_class": "LlamaTokenizer", "unk_token": "<unk>"}