cvat: [GSoC2024] Filenames with same name but different extensions cause error
Actions before raising this issue
- I searched the existing issues and did not find anything similar.
- I read/searched the docs
Steps to Reproduce
- Create a new task on CVAT.
- upload images named
image1.pngandimage1.jpg - upload COCO annotations
- Will throw error if you have an image named
image1.jpgand another image namedimage1.pngeven if both the images are very different:
Could not upload annotation for the [task 4](http://localhost:8080/tasks/4)
Item ('image1', 'default') is repeated in the source sequence..
COCO annotation json:
{
"info": {
"description": "my-project-name"
},
"images": [
{
"id": 1,
"width": 1200,
"height": 1600,
"file_name": "image1.jpg"
},
{
"id": 2,
"width": 2592,
"height": 1944,
"file_name": "image1.png"
}
],
"annotations": [
{
"id": 0,
"iscrowd": 0,
"image_id": 1,
"category_id": 1,
"segmentation": [
[
787.1904355251921,
419.47053800170795,
850.0426985482494,
710.5038428693424,
500.25619128949614,
639.4534585824082,
778.9923142613151,
420.8368915456874
]
],
"bbox": [
500.25619128949614,
419.47053800170795,
349.78650725875326,
291.03330486763446
],
"area": 49372.61940096601
},
{
"id": 1,
"iscrowd": 0,
"image_id": 2,
"category_id": 1,
"segmentation": [
[
1086.8249359521776,
424.99060631938517,
1204.6934244235697,
848.3210930828352,
885.9504696840307,
800.1776259607174
]
],
"bbox": [
885.9504696840307,
424.99060631938517,
318.74295473953896,
423.33048676345004
],
"area": 64629.50624142664
}
],
"categories": [
{
"id": 1,
"name": "object"
}
]
}
Removing entry or doing other things throws other errors. This really should not happen because COCO supports adding extensions to the file name
Expected Behavior
It should allow a person to upload files and COCO annotations even if file name is same, as long as the file extension is different
Possible Solution
It should allow a person to upload files and COCO annotations even if file name is same, as long as the file extension is different
Context
I want to add that in general CVAT has the tendency to completely abort any upload operation the moment it finds a single error. this results in huge time loss just trying to debug CVAT errors (which technically shouldn’t even be errors since this is a perfect use of the COCO file format). If a few extra images are annotated in COCO, it will throw an error instead of just annotating the images present in the job and completely abort the operation. if a single annotation has an error, it will abort the operation again. Instead, it should just warn you and try and import as many annotations as possible, like roboflow does.
Environment
CVAT website, or even locally installed CVAT. all cause this issue.
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 26 (23 by maintainers)
Commits related to this issue
- Resolved issue #7523: Fixed error caused by filenames with the same name but different extensions. — committed to adkbbx/cvat by deleted user 4 months ago
@adkbbx, answered in the PR.
@adkbbx, well, there are 2 more steps to implement, as I wrote in https://github.com/opencv/cvat/issues/7523#issuecomment-1988060365.
@adkbbx,
Basically, in the comment above https://github.com/opencv/cvat/issues/7523#issuecomment-1988634376 you already did this (the function
update_annotation_file). I think it’s enough for updating the input file.@adkbbx,
Yes, this is how it should be.
I’m not sure I understand what you meant here. Speaking about exporting, I think it should work similarly - a mapping is created in what’s returned by this function, then the dataset is exported as usual, then files are mapped in the output jsons.
@adkbbx, probably, it will be more comfortable to review code in a PR, please create one.
I expect it to be some code in this and this file.
Probably, you can just replace all the file names, don’t need to resolve just the repeated ones. Simply iterating over the json’s images list and updating names inplace, while remembering the new names, should be enough. The code you attached iterates over images in a directory, but when annotations are imported, you don’t have images in the input files.
Using such a pattern to resolve conflicts:
Can lead to new conflicts, that’s why just replacing all the names with something new and unique is better.
Also, consider using
Pathanduuid.uuid4, plain numbers, or hashes in the implementation.In the annotation files you attached all the images have id 1, this is not correct. Consider creating one just by exporting from CVAT or make sure the file is correct.
Hi @adkbbx,
It was an architecture decision in Datumaro, so changing it there looks quite a hard way to fix the problem. As the problem is a part of Datumaro design, it affects several formats in CVAT. Actually, I think we can implement the 1st variant from what you suggested. Probably, the implementation should look like this:
import_dm_annotations/match_dm_item, supply the mappingmatch_dm_item, maybe add a new matching case for thisThis solution can be reused for different formats, if needed.
@adkbbx , I have assigned. Please try to reproduce the issue first. After you reproduce it, please propose a solution here. My team will help you to polish the proposal.