albumentations: [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline

🐛 Bug

I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:

import os
import random

import numpy as np
import tensorflow as tf

def set_random_seed(seed: int = 42):
    """
    Globally fix all possible sources of randomness to keep experiment reproducible 
    """
    random.seed(seed)
    np.random.seed(seed)
    tf.random.set_seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    os.environ['TF_DETERMINISTIC_OPS'] = '1'
    os.environ['TF_CUDNN_DETERMINISTIC'] = '1'

Unfortunately, this doesn’t help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:

https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2bdgnbwx (best_val_acc: 0.7104, best_epoch: 3)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2qo9pbls (best_val_acc: 0.7875, best_epoch: 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/uf6cknge (best_val_acc: 0.6771, best_epoch: 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/tem3umbx (best_val_acc: 0.7729, best_epoch: 6)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/czsjm7px (best_val_acc: 0.7208, best_epochs: 0 and 8)
https://wandb.ai/roma-glushko/rock-paper-scissors/runs/29dif98z (best_val_acc: 0.8, best_epoch: 9)

Mean: 0.74478
Std: 0.044726

Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.

However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.

Any clues what’s wrong here?

To Reproduce

Steps to reproduce the behavior:

Clone the project state at 0.1.0-bugrep tag:

git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor

Pull dataset:

cd data
kaggle datasets download --unzip frtgnn/rock-paper-scissor

Install project deps:

poetry install

Uncomment any of the reported augmentations in the config file (they are all commented out in the git): https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py
Run training a couple of times and you get results that differs by a lot:

python train.py

Expected behavior

In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.

Environment

Albumentations version (e.g., 0.1.8): 0.5.2
Python version (e.g., 3.7): 3.8.6
OS (e.g., Linux): Ubuntu 20.10
How you installed albumentations (conda, pip, source): poetry (pip-like)
tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))

Additional context

This report is reproduced in a project that is also mentioned in https://github.com/albumentations-team/albumentations/issues/905

The data pipeline is the same for both issues:

def augment_image(inputs, labels, augmentation_pipeline: a.Compose):
    def apply_augmentation(images):
        aug_data = augmentation_pipeline(image=images.astype('uint8'))
        return aug_data['image']

    inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)

    return inputs, labels


def get_dataset(
        dataset_path: str,
        subset_type: str,
        augmentation_pipeline: a.Compose,
        validation_fraction: float = 0.2,
        batch_size: int = 32,
        image_size: Tuple[int, int] = (300, 300),
        seed: int = 42
) -> tf.data.Dataset:
    augmentation_func = partial(
        augment_image,
        augmentation_pipeline=augmentation_pipeline,
    )

    dataset = image_dataset_from_directory(
        dataset_path,
        subset=subset_type,
        class_names=class_names,
        validation_split=validation_fraction,
        image_size=image_size,
        batch_size=batch_size,
        seed=seed,
    )

    return dataset \
        .map(augmentation_func, num_parallel_calls=AUTOTUNE) \
        .prefetch(AUTOTUNE)

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 19 (3 by maintainers)

Commits related to this issue

Added debug information in order to provide more data for albumentations-team/albumentations#906 issue — committed to roma-glushko/rock-paper-scissors by roma-glushko 3 years ago

Most upvoted comments

Looks good. I think current differences associated with the instability of algorithms and hardware.

Dipet on May 31, 2021

Hmm. All of a sudden, this issue starts looking more interesting than at the beginning.

Чт, 27 мая 2021 г. в 11:57, Roman Glushko @.***>:

@Dipet https://github.com/Dipet sure, all tests were performed with the following configuration of augmentation pipeline:

args[‘train_augmentation’] = a.Compose([ a.VerticalFlip(), a.HorizontalFlip(), a.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.1, brightness_by_max=False), a.CoarseDropout(max_holes=20, max_height=8, max_width=8, min_holes=10, min_height=8, min_width=8), a.GaussNoise(p=1.0, var_limit=(10.0, 50.0)), ]) args[‘validation_augmentation’] = a.Compose([])

I kept validation step augmentation-free as @BloodAxe https://github.com/BloodAxe suggested above.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/albumentations-team/albumentations/issues/906#issuecomment-849462738, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEB6YDBTXNOLU5VJ6VVWLTTPYCPFANCNFSM45LQYTDQ .

BloodAxe on May 27, 2021