litdata: The tested speed is not as fast as expected.
🐛 Bug
The tested speed is not as fast as expected.
Code sample
import os
import torch
import numpy as np
from tqdm import tqdm
from torchvision.transforms import Compose, Lambda
from litdata import StreamingDataset, StreamingDataLoader
from torchvision.transforms._transforms_video import NormalizeVideo, RandomCropVideo, RandomHorizontalFlipVideo, CenterCropVideo
input_dir = 's3://extract_frames/'
OPENAI_DATASET_MEAN = (0.48145466, 0.4578275, 0.40821073)
OPENAI_DATASET_STD = (0.26862954, 0.26130258, 0.27577711)
class ImagenetStreamingDataset(StreamingDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
self.transform = Compose(
[
Lambda(lambda x: x / 255.0),
NormalizeVideo(mean=OPENAI_DATASET_MEAN, std=OPENAI_DATASET_STD),
# ShortSideScale(size=224),
CenterCropVideo(224),
]
)
def __getitem__(self, index):
data = super().__getitem__(index)
video_data = []
for i in range(8):
frame = np.array(data["image"][i])
video_data.append(torch.from_numpy(frame).permute(2, 0, 1))
video_data = torch.stack(video_data, dim=1)
video_data = self.transform(video_data)
return video_data
dataset = ImagenetStreamingDataset(input_dir, shuffle=True)
dataloader = StreamingDataLoader(dataset, batch_size=64, num_workers=8)
for batch in tqdm(dataloader, total=len(dataloader)):
pass
Expected behavior
There are approximately 200,000 data points, each consisting of 8 frames extracted. Based on the tested speed, it should be very fast, but in reality, it is not.
The tested speed is approximately as follows:
Environment
- PyTorch Version (e.g., 1.0): 2.2.1
- OS (e.g., Linux): linux
- How you installed PyTorch (
conda,pip, source): pip - Python version: 3.9
- CUDA/cuDNN version:11.6
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 19
@tikboaHIT I also fixed the
chunk_bytesnot being correct with the optimize operator.A more efficient one would be to encode the images in JPEG as follow:
When streaming it from the cloud, it takes 1 seconds now.
Additionally, I recommend using torchvision.transforms.v2 which are roughly 40% faster at resizing the images, etc…
But alternatively, we support videos from torchvision video support: https://pytorch.org/audio/stable/build.ffmpeg.html. If you convert your clips into av1 format, they should get super small. You should be able to stream them easily and de-serialize them faster. Worth exploring.
Sure:
The speed is as follows when I load from local.
The speed is as follows when I load from s3.