Otter: mismatch the size of datasets

As the mentioned in paper, the MIMIC-IT dataset has 2.2M instruction qa. But I have downloaded all x_instruction.json from Hugging Face. The total number of instruction qa is 1171k. Anything I miss?

VST 32k image-qa LA 256k image-qa SN 6k image-qa SD 16k image-qa CGD 141k image-qa E4D 527k video-qa DC 56k video-qa TVC 137k video-qa

In a word, 451k image-qa & 720k video-qa, which 1171k qa totally.

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Comments: 24 (7 by maintainers)

Most upvoted comments

The size mismatch may come from we iteratively cleaned the dataset after submission. We will update the paper later when numbers are fully confirmed.

Let me check the VST’s missing image_ids then.

@Luodian Thanks very very very very much, LA is okay. Now only VST lacks some samples.

Again, thanks for your brilliant work and repo and your help, which teaches me a lot!

Thanks for your great contribution.

That would be much appreciated if it could be uploaded to hugging face. the OneDrive link is too unstable.

The E4D size is incorrect. I think it’s because We have four parts and we may only upload the first part. Let me prepare upload the rest parts accordingly. BUT it may take fews days since they are pretty large.