transformers: Detr models crashes when changing the num_queries parameter in the config
System Info
transformers
version: 4.36.2- Platform: Linux-5.15.133±x86_64-with-glibc2.35
- Python version: 3.10.10
- Huggingface_hub version: 0.20.2
- Safetensors version: 0.4.1
- Accelerate version: 0.26.1
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes, Tesla T4
- Using distributed or parallel set-up in script?: No
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
- Load the model with a custom
num_queries
hyperparameter.id2label = {0: 'Test'} label2id = {'Test': 0} model_name = "facebook/detr-resnet-50" image_processor = AutoImageProcessor.from_pretrained(model_name) detr = DetrForObjectDetection.from_pretrained( model_name, id2label=id2label, label2id=label2id, ignore_mismatched_sizes=True, num_queries=5 )
- Train (or just run the forward pass with an input containing
labels
)
I got the following error
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ in <module>:1 │
│ │
│ ❱ 1 trainer.train() │
│ 2 │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1537 in │
│ train │
│ │
│ 1534 │ │ │ finally: │
│ 1535 │ │ │ │ hf_hub_utils.enable_progress_bars() │
│ 1536 │ │ else: │
│ ❱ 1537 │ │ │ return inner_training_loop( │
│ 1538 │ │ │ │ args=args, │
│ 1539 │ │ │ │ resume_from_checkpoint=resume_from_checkpoint, │
│ 1540 │ │ │ │ trial=trial, │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:1854 in │
│ _inner_training_loop │
│ │
│ 1851 │ │ │ │ │ self.control = self.callback_handler.on_step_begin(args, self.state, │
│ 1852 │ │ │ │ │
│ 1853 │ │ │ │ with self.accelerator.accumulate(model): │
│ ❱ 1854 │ │ │ │ │ tr_loss_step = self.training_step(model, inputs) │
│ 1855 │ │ │ │ │
│ 1856 │ │ │ │ if ( │
│ 1857 │ │ │ │ │ args.logging_nan_inf_filter │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2735 in │
│ training_step │
│ │
│ 2732 │ │ │ return loss_mb.reduce_mean().detach().to(self.args.device) │
│ 2733 │ │ │
│ 2734 │ │ with self.compute_loss_context_manager(): │
│ ❱ 2735 │ │ │ loss = self.compute_loss(model, inputs) │
│ 2736 │ │ │
│ 2737 │ │ if self.args.n_gpu > 1: │
│ 2738 │ │ │ loss = loss.mean() # mean() to average on multi-gpu parallel training │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/trainer.py:2758 in │
│ compute_loss │
│ │
│ 2755 │ │ │ labels = inputs.pop("labels") │
│ 2756 │ │ else: │
│ 2757 │ │ │ labels = None │
│ ❱ 2758 │ │ outputs = model(**inputs) │
│ 2759 │ │ # Save past state if it exists │
│ 2760 │ │ # TODO: this needs to be fixed and made cleaner later. │
│ 2761 │ │ if self.args.past_index >= 0: │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518 │
│ in _wrapped_call_impl │
│ │
│ 1515 │ │ if self._compiled_call_impl is not None: │
│ 1516 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] │
│ 1517 │ │ else: │
│ ❱ 1518 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1519 │ │
│ 1520 │ def _call_impl(self, *args, **kwargs): │
│ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527 │
│ in _call_impl │
│ │
│ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1527 │ │ │ return forward_call(*args, **kwargs) │
│ 1528 │ │ │
│ 1529 │ │ try: │
│ 1530 │ │ │ result = None │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling │
│ _detr.py:1603 in forward │
│ │
│ 1600 │ │ │ │ auxiliary_outputs = self._set_aux_loss(outputs_class, outputs_coord) │
│ 1601 │ │ │ │ outputs_loss["auxiliary_outputs"] = auxiliary_outputs │
│ 1602 │ │ │ │
│ ❱ 1603 │ │ │ loss_dict = criterion(outputs_loss, labels) │
│ 1604 │ │ │ # Fourth: compute total loss, as a weighted sum of the various losses │
│ 1605 │ │ │ weight_dict = {"loss_ce": 1, "loss_bbox": self.config.bbox_loss_coefficient} │
│ 1606 │ │ │ weight_dict["loss_giou"] = self.config.giou_loss_coefficient │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518 │
│ in _wrapped_call_impl │
│ │
│ 1515 │ │ if self._compiled_call_impl is not None: │
│ 1516 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] │
│ 1517 │ │ else: │
│ ❱ 1518 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1519 │ │
│ 1520 │ def _call_impl(self, *args, **kwargs): │
│ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527 │
│ in _call_impl │
│ │
│ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1527 │ │ │ return forward_call(*args, **kwargs) │
│ 1528 │ │ │
│ 1529 │ │ try: │
│ 1530 │ │ │ result = None │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling │
│ _detr.py:2202 in forward │
│ │
│ 2199 │ │ outputs_without_aux = {k: v for k, v in outputs.items() if k != "auxiliary_outpu │
│ 2200 │ │ │
│ 2201 │ │ # Retrieve the matching between the outputs of the last layer and the targets │
│ ❱ 2202 │ │ indices = self.matcher(outputs_without_aux, targets) │
│ 2203 │ │ │
│ 2204 │ │ # Compute the average number of target boxes across all nodes, for normalization │
│ 2205 │ │ num_boxes = sum(len(t["class_labels"]) for t in targets) │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1518 │
│ in _wrapped_call_impl │
│ │
│ 1515 │ │ if self._compiled_call_impl is not None: │
│ 1516 │ │ │ return self._compiled_call_impl(*args, **kwargs) # type: ignore[misc] │
│ 1517 │ │ else: │
│ ❱ 1518 │ │ │ return self._call_impl(*args, **kwargs) │
│ 1519 │ │
│ 1520 │ def _call_impl(self, *args, **kwargs): │
│ 1521 │ │ forward_call = (self._slow_forward if torch._C._get_tracing_state() else self.fo │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py:1527 │
│ in _call_impl │
│ │
│ 1524 │ │ if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks │
│ 1525 │ │ │ │ or _global_backward_pre_hooks or _global_backward_hooks │
│ 1526 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1527 │ │ │ return forward_call(*args, **kwargs) │
│ 1528 │ │ │
│ 1529 │ │ try: │
│ 1530 │ │ │ result = None │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py:115 in │
│ decorate_context │
│ │
│ 112 │ @functools.wraps(func) │
│ 113 │ def decorate_context(*args, **kwargs): │
│ 114 │ │ with ctx_factory(): │
│ ❱ 115 │ │ │ return func(*args, **kwargs) │
│ 116 │ │
│ 117 │ return decorate_context │
│ 118 │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling │
│ _detr.py:2323 in forward │
│ │
│ 2320 │ │ bbox_cost = torch.cdist(out_bbox, target_bbox, p=1) │
│ 2321 │ │ │
│ 2322 │ │ # Compute the giou cost between boxes │
│ ❱ 2323 │ │ giou_cost = -generalized_box_iou(center_to_corners_format(out_bbox), center_to_c │
│ 2324 │ │ │
│ 2325 │ │ # Final cost matrix │
│ 2326 │ │ cost_matrix = self.bbox_cost * bbox_cost + self.class_cost * class_cost + self.g │
│ │
│ /home/jovyan/obj-detection/.venv/lib/python3.10/site-packages/transformers/models/detr/modeling │
│ _detr.py:2388 in generalized_box_iou │
│ │
│ 2385 │ # degenerate boxes gives inf / nan results │
│ 2386 │ # so do an early check │
│ 2387 │ if not (boxes1[:, 2:] >= boxes1[:, :2]).all(): │
│ ❱ 2388 │ │ raise ValueError(f"boxes1 must be in [x0, y0, x1, y1] (corner) format, but got { │
│ 2389 │ if not (boxes2[:, 2:] >= boxes2[:, :2]).all(): │
│ 2390 │ │ raise ValueError(f"boxes2 must be in [x0, y0, x1, y1] (corner) format, but got { │
│ 2391 │ iou, union = box_iou(boxes1, boxes2) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: boxes1 must be in [x0, y0, x1, y1] (corner) format, but got tensor([[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan],
[nan, nan, nan, nan]], device='cuda:0')
The same code works fine without changing the default num_queries
.
Expected behavior
I would expect the model to run as normal.
I am fine tuning the model in a custom dataset which should not have more than a couple of objects per image, and expected the number of queries to have no impact other than limiting the maximum number of objects found.
About this issue
- Original URL
- State: open
- Created 5 months ago
- Comments: 15 (2 by maintainers)
I’ve released the model here: https://huggingface.co/isalia99/detr-resnet-50-sku110k and code is here: https://github.com/Isalia20/DETR-finetune