transformers: Detr models crashes when changing the num_queries parameter in the config

System Info

transformers version: 4.36.2
Platform: Linux-5.15.133±x86_64-with-glibc2.35
Python version: 3.10.10
Huggingface_hub version: 0.20.2
Safetensors version: 0.4.1
Accelerate version: 0.26.1
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes, Tesla T4
Using distributed or parallel set-up in script?: No

Who can help?

@amyeroberts

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Load the model with a custom num_queries hyperparameter.

id2label = {0: 'Test'}
label2id = {'Test': 0}
model_name = "facebook/detr-resnet-50"
image_processor = AutoImageProcessor.from_pretrained(model_name)
detr = DetrForObjectDetection.from_pretrained(
    model_name,
    id2label=id2label,
    label2id=label2id,
    ignore_mismatched_sizes=True,
    num_queries=5
)

Train (or just run the forward pass with an input containing labels)

I got the following error

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1                                                                                    │
│                                                                                                  │
│ ❱ 1 trainer.train()                                                                              │
│   2                                                                                              │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1537 in    │
│ train                                                                                            │
│                                                                                                  │
│   1534 │   │   │   finally:                                                                      │
│   1535 │   │   │   │   hf_hub_utils.enable_progress_bars()                                       │
│   1536 │   │   else:                                                                             │
│ ❱ 1537 │   │   │   return inner_training_loop(                                                   │
│   1538 │   │   │   │   args=args,                                                                │
│   1539 │   │   │   │   resume_from_checkpoint=resume_from_checkpoint,                            │
│   1540 │   │   │   │   trial=trial,                                                              │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1854 in    │
│ _inner_training_loop                                                                             │
│                                                                                                  │
│   1851 │   │   │   │   │   self.control = self.callback_handler.on_step_begin(args, self.state,  │
│   1852 │   │   │   │                                                                             │
│   1853 │   │   │   │   with self.accelerator.accumulate(model):                                  │
│ ❱ 1854 │   │   │   │   │   tr_loss_step = self.training_step(model, inputs)                      │
│   1855 │   │   │   │                                                                             │
│   1856 │   │   │   │   if (                                                                      │
│   1857 │   │   │   │   │   args.logging_nan_inf_filter                                           │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2735 in    │
│ training_step                                                                                    │
│                                                                                                  │
│   2732 │   │   │   return loss_mb.reduce_mean().detach().to(self.args.device)                    │
│   2733 │   │                                                                                     │
│   2734 │   │   with self.compute_loss_context_manager():                                         │
│ ❱ 2735 │   │   │   loss = self.compute_loss(model, inputs)                                       │
│   2736 │   │                                                                                     │
│   2737 │   │   if self.args.n_gpu > 1:                                                           │
│   2738 │   │   │   loss = loss.mean()  # mean() to average on multi-gpu parallel training        │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2758 in    │
│ compute_loss                                                                                     │
│                                                                                                  │
│   2755 │   │   │   labels = inputs.pop("labels")                                                 │
│   2756 │   │   else:                                                                             │
│   2757 │   │   │   labels = None                                                                 │
│ ❱ 2758 │   │   outputs = model(**inputs)                                                         │
│   2759 │   │   # Save past state if it exists                                                    │
│   2760 │   │   # TODO: this needs to be fixed and made cleaner later.                            │
│   2761 │   │   if self.args.past_index >= 0:                                                     │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    │
│ in _wrapped_call_impl                                                                            │
│                                                                                                  │
│   1515 │   │   if self._compiled_call_impl is not None:                                          │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1517 │   │   else:                                                                             │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1519 │                                                                                         │
│   1520 │   def _call_impl(self, *args, **kwargs):                                                │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    │
│ in _call_impl                                                                                    │
│                                                                                                  │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1528 │   │                                                                                     │
│   1529 │   │   try:                                                                              │
│   1530 │   │   │   result = None                                                                 │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  │
│ _detr.py:1603 in forward                                                                         │
│                                                                                                  │
│   1600 │   │   │   │   auxiliary_outputs = self._set_aux_loss(outputs_class, outputs_coord)      │
│   1601 │   │   │   │   outputs_loss["auxiliary_outputs"] = auxiliary_outputs                     │
│   1602 │   │   │                                                                                 │
│ ❱ 1603 │   │   │   loss_dict = criterion(outputs_loss, labels)                                   │
│   1604 │   │   │   # Fourth: compute total loss, as a weighted sum of the various losses         │
│   1605 │   │   │   weight_dict = {"loss_ce": 1, "loss_bbox": self.config.bbox_loss_coefficient}  │
│   1606 │   │   │   weight_dict["loss_giou"] = self.config.giou_loss_coefficient                  │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    │
│ in _wrapped_call_impl                                                                            │
│                                                                                                  │
│   1515 │   │   if self._compiled_call_impl is not None:                                          │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1517 │   │   else:                                                                             │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1519 │                                                                                         │
│   1520 │   def _call_impl(self, *args, **kwargs):                                                │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    │
│ in _call_impl                                                                                    │
│                                                                                                  │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1528 │   │                                                                                     │
│   1529 │   │   try:                                                                              │
│   1530 │   │   │   result = None                                                                 │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  │
│ _detr.py:2202 in forward                                                                         │
│                                                                                                  │
│   2199 │   │   outputs_without_aux = {k: v for k, v in outputs.items() if k != "auxiliary_outpu  │
│   2200 │   │                                                                                     │
│   2201 │   │   # Retrieve the matching between the outputs of the last layer and the targets     │
│ ❱ 2202 │   │   indices = self.matcher(outputs_without_aux, targets)                              │
│   2203 │   │                                                                                     │
│   2204 │   │   # Compute the average number of target boxes across all nodes, for normalization  │
│   2205 │   │   num_boxes = sum(len(t["class_labels"]) for t in targets)                          │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518    │
│ in _wrapped_call_impl                                                                            │
│                                                                                                  │
│   1515 │   │   if self._compiled_call_impl is not None:                                          │
│   1516 │   │   │   return self._compiled_call_impl(*args, **kwargs)  # type: ignore[misc]        │
│   1517 │   │   else:                                                                             │
│ ❱ 1518 │   │   │   return self._call_impl(*args, **kwargs)                                       │
│   1519 │                                                                                         │
│   1520 │   def _call_impl(self, *args, **kwargs):                                                │
│   1521 │   │   forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo  │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527    │
│ in _call_impl                                                                                    │
│                                                                                                  │
│   1524 │   │   if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks   │
│   1525 │   │   │   │   or _global_backward_pre_hooks or _global_backward_hooks                   │
│   1526 │   │   │   │   or _global_forward_hooks or _global_forward_pre_hooks):                   │
│ ❱ 1527 │   │   │   return forward_call(*args, **kwargs)                                          │
│   1528 │   │                                                                                     │
│   1529 │   │   try:                                                                              │
│   1530 │   │   │   result = None                                                                 │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in  │
│ decorate_context                                                                                 │
│                                                                                                  │
│   112 │   @functools.wraps(func)                                                                 │
│   113 │   def decorate_context(*args, **kwargs):                                                 │
│   114 │   │   with ctx_factory():                                                                │
│ ❱ 115 │   │   │   return func(*args, **kwargs)                                                   │
│   116 │                                                                                          │
│   117 │   return decorate_context                                                                │
│   118                                                                                            │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  │
│ _detr.py:2323 in forward                                                                         │
│                                                                                                  │
│   2320 │   │   bbox_cost = torch.cdist(out_bbox, target_bbox, p=1)                               │
│   2321 │   │                                                                                     │
│   2322 │   │   # Compute the giou cost between boxes                                             │
│ ❱ 2323 │   │   giou_cost = -generalized_box_iou(center_to_corners_format(out_bbox), center_to_c  │
│   2324 │   │                                                                                     │
│   2325 │   │   # Final cost matrix                                                               │
│   2326 │   │   cost_matrix = self.bbox_cost * bbox_cost + self.class_cost * class_cost + self.g  │
│                                                                                                  │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling  │
│ _detr.py:2388 in generalized_box_iou                                                             │
│                                                                                                  │
│   2385 │   # degenerate boxes gives inf / nan results                                            │
│   2386 │   # so do an early check                                                                │
│   2387 │   if not (boxes1[:, 2:] >= boxes1[:, :2]).all():                                        │
│ ❱ 2388 │   │   raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got {  │
│   2389 │   if not (boxes2[:, 2:] >= boxes2[:, :2]).all():                                        │
│   2390 │   │   raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got {  │
│   2391 │   iou, union = box_iou(boxes1, boxes2)                                                  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan],
        [nan, nan, nan, nan]], device='cuda:0')

The same code works fine without changing the default num_queries.

Expected behavior

I would expect the model to run as normal.

I am fine tuning the model in a custom dataset which should not have more than a couple of objects per image, and expected the number of queries to have no impact other than limiting the maximum number of objects found.

About this issue

Original URL
State: open
Created 5 months ago
Comments: 15 (2 by maintainers)

Most upvoted comments

I’ve released the model here: https://huggingface.co/isalia99/detr-resnet-50-sku110k and code is here: https://github.com/Isalia20/DETR-finetune

Isalia20 on Mar 16, 2024