Otter: mismatch the size of datasets

As the mentioned in paper, the MIMIC-IT dataset has 2.2M instruction qa. But I have downloaded all x_instruction.json from Hugging Face. The total number of instruction qa is 1171k. Anything I miss?

VST 32k image-qa LA 256k image-qa SN 6k image-qa SD 16k image-qa CGD 141k image-qa E4D 527k video-qa DC 56k video-qa TVC 137k video-qa

In a word, 451k image-qa & 720k video-qa, which 1171k qa totally.

About this issue

Original URL
State: closed
Created 7 months ago
Comments: 24 (7 by maintainers)

Most upvoted comments

The size mismatch may come from we iteratively cleaned the dataset after submission. We will update the paper later when numbers are fully confirmed.

Let me check the VST’s missing image_ids then.

Luodian on Dec 19, 2023

@Luodian Thanks very very very very much, LA is okay. Now only VST lacks some samples.

Again, thanks for your brilliant work and repo and your help, which teaches me a lot!

Li-Qingyun on Dec 19, 2023

This: https://github.com/Luodian/Otter/issues/320

Luodian on Dec 19, 2023

Thanks for your great contribution.

LinB203 on Dec 12, 2023

That would be much appreciated if it could be uploaded to hugging face. the OneDrive link is too unstable.

The E4D size is incorrect. I think it’s because We have four parts and we may only upload the first part. Let me prepare upload the rest parts accordingly. BUT it may take fews days since they are pretty large.

LinB203 on Dec 6, 2023