ffcv: [Bug] Synchronization issue on GPU

I’m using the v0.0.4 version from this branch: https://github.com/libffcv/ffcv/tree/v0.0.4

There’s a (possibly major) bug where two models will not receive the same inputs from the FFCV dataloader, unless torch.cuda.synchronize is explicitly called. Below is a simple code snippet to reproduce this issue:

import torch
from torchvision.models import resnet18
from tqdm import tqdm
from copy import deepcopy

dataloader = create_ffcv_dataloader()  # Your own custom dataloader factory
model1 = resnet18(pretrained=False).cuda()
model2 = deepcopy(model1)
with torch.no_grad():
    for it, (imgs, *_) in enumerate(tqdm(dataloader)):
        model1(imgs)
        model2(imgs)
        # Uncommenting the following line will pass the assertion at the bottom, while leaving it commented will trigger assertion error
        # torch.cuda.synchronize()         
        if it == 20:
            break

    assert model1.bn1.running_mean.allclose(model2.bn1.running_mean)

BatchNorm tracks running stats, which can be used to check whether two identical models received the same inputs on the forward pass. Without torch.cuda.synchronize(), the above code will trigger an assertion error, since the two models received different inputs at some point. With torch.cuda.synchronize(), no assertion error will be triggered. Also, I have noticed that this behavior does not necessarily happen with larger models, where the forward pass takes a longer time.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Here is the strictly minimal version of the test that makes it fail 100% of the time on my 3090:

import torch
from ffcv import Loader
from ffcv.fields.rgb_image import SimpleRGBImageDecoder
from ffcv.transforms import ToTensor, ToDevice, ToTorchImage, Convert
from torchvision.models import resnet18
from tqdm import tqdm

def main():
    beton_path = 'cifar_train.beton'    # Your FFCV .beton file here
    image_pipeline = [SimpleRGBImageDecoder(), ToTensor(),
                      ToDevice(torch.device(0), non_blocking=False), ToTorchImage(),
                      Convert(torch.float32),
      ]
    loader = Loader(beton_path, batch_size=512, num_workers=1,
                    pipelines={'image': image_pipeline, 'label': None})
    model1 = resnet18(pretrained=False).cuda()
    model2 = resnet18(pretrained=False).cuda()
    model2.load_state_dict(model1.state_dict())

    while True:
        with torch.no_grad():
            for it, (imgs,) in enumerate(tqdm(loader)):
                # imgs = imgs.clone()
                model1(imgs)
                model2(imgs)
                # torch.cuda.synchronize()
                # breakpoint()
                if it == 2: # 1 works sometimes but not 100% of the time on my GPU
                    break

            assert model1.bn1.running_mean.allclose(model2.bn1.running_mean)


if __name__ == "__main__":
    main()