mmocr: Getting AssertionError: UniformConcatDataset: OCRDataset: AnnFileLoader: when build dataset
Upon building dataset, the compiler return AssertionError.
The complete trace-back is as below.
AssertionError:
During handling of the above exception, another exception occurred:
AssertionError Traceback (most recent call last)
AssertionError: AnnFileLoader:
During handling of the above exception, another exception occurred:
AssertionError Traceback (most recent call last)
AssertionError: OCRDataset: AnnFileLoader:
During handling of the above exception, another exception occurred:
AssertionError Traceback (most recent call last)
[/usr/local/lib/python3.7/dist-packages/mmcv/utils/registry.py](https://localhost:8080/#) in build_from_cfg(cfg, registry, default_args)
67 except Exception as e:
68 # Normal TypeError does not print class name.
---> 69 raise type(e)(f'{obj_cls.__name__}: {e}')
70
71
AssertionError: UniformConcatDataset: OCRDataset: AnnFileLoader:
In this case, I am using annotation of the format jsonl. Hence, the both the parser under the loader for training and testing was set to LineJsonParser.
loader_dt_train = dict(type='AnnFileLoader',
repeat=1,
file_format='jsonl',
file_storage_backend='disk',
parser=dict(type='LineJsonParser',
keys=['filename', 'text']))
loader_dt_test = dict(type = 'AnnFileLoader',
repeat = 1,
file_format = 'jsonl',
file_storage_backend = 'disk',
parser = dict(type = 'LineJsonParser',
keys = ['filename', 'text']))
train_datasets1 = dict(type='OCRDataset',
img_prefix=img_prefix,
ann_file=train_anno_file1,
loader=loader_dt_train,
pipeline=None,
test_mode=False)
val_dataset = dict(type='OCRDataset',
img_prefix=img_prefix,
ann_file=train_anno_file1,
loader=loader_dt_test,
pipeline=None,
test_mode=True)
I think, the type (e.g., AnnFileLoader and OCRDataset )has been assigned properly.
May I know what is the issue.
The full code and issue can be reproduced via this Notebook.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (10 by maintainers)
Hi @balandongiv, I find your loader config was wrong. Change
file_formatfromjsonltotxtcan solve this issue:Loaderonly acceptstxtorlmdbasfile_format. Essentially,jsonlfiles are stored as raw texts but parsed differently. Sorry for the confusion. I do think it has an inconsistent design with data converters and should be fixed soon.