albumentations: [TensorFlow] Failed to get reproducible trainings with albumentations included to the data pipeline
🐛 Bug
I could not get my training work in reproducible way when albumentations added to the data pipeline. I followed this thread https://github.com/albumentations-team/albumentations/issues/93 and fixed all possible seeds, so in overall my snippet that should have enabled reproducible experiments looks like this:
import os
import random
import numpy as np
import tensorflow as tf
def set_random_seed(seed: int = 42):
"""
Globally fix all possible sources of randomness to keep experiment reproducible
"""
random.seed(seed)
np.random.seed(seed)
tf.random.set_seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
os.environ['TF_DETERMINISTIC_OPS'] = '1'
os.environ['TF_CUDNN_DETERMINISTIC'] = '1'
Unfortunately, this doesn’t help me to get reproducible results. I have executed training process 6 times and got all different results. You can also see the whole picture in W&B:
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2bdgnbwx (best_val_acc: 0.7104, best_epoch: 3)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/2qo9pbls (best_val_acc: 0.7875, best_epoch: 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/uf6cknge (best_val_acc: 0.6771, best_epoch: 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/tem3umbx (best_val_acc: 0.7729, best_epoch: 6)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/czsjm7px (best_val_acc: 0.7208, best_epochs: 0 and 8)
- https://wandb.ai/roma-glushko/rock-paper-scissors/runs/29dif98z (best_val_acc: 0.8, best_epoch: 9)
- Mean: 0.74478
- Std: 0.044726
Also, I tried to set random.seed() right before passing my batch into a.Compose() pipeline. That did not really help.
However, when I comment out albumentations from my data pipeline or replace it with some pure TF augmentations, I can get my training reproducible.
Any clues what’s wrong here?
To Reproduce
Steps to reproduce the behavior:
- Clone the project state at
0.1.0-bugrep
tag:
git clone --depth 1 --branch 0.1.0-bugrep https://github.com/roma-glushko/rock-paper-scissor
- Pull dataset:
cd data
kaggle datasets download --unzip frtgnn/rock-paper-scissor
- Install project deps:
poetry install
-
Uncomment any of the reported augmentations in the config file (they are all commented out in the git): https://github.com/roma-glushko/rock-paper-scissor/blob/master/configs/basic_config.py
-
Run training a couple of times and you get results that differs by a lot:
python train.py
Expected behavior
In order to do experiments that analyze impact of different ideas and changes, I would like to see my training process reproducible.
Environment
- Albumentations version (e.g., 0.1.8): 0.5.2
- Python version (e.g., 3.7): 3.8.6
- OS (e.g., Linux): Ubuntu 20.10
- How you installed albumentations (
conda
,pip
, source): poetry (pip-like) - tensorflow-gpu: 2.5.0 (for the sake of compatibility with RTX3070 (ampere arch.))
Additional context
This report is reproduced in a project that is also mentioned in https://github.com/albumentations-team/albumentations/issues/905
The data pipeline is the same for both issues:
def augment_image(inputs, labels, augmentation_pipeline: a.Compose):
def apply_augmentation(images):
aug_data = augmentation_pipeline(image=images.astype('uint8'))
return aug_data['image']
inputs = tf.numpy_function(func=apply_augmentation, inp=[inputs], Tout=tf.uint8)
return inputs, labels
def get_dataset(
dataset_path: str,
subset_type: str,
augmentation_pipeline: a.Compose,
validation_fraction: float = 0.2,
batch_size: int = 32,
image_size: Tuple[int, int] = (300, 300),
seed: int = 42
) -> tf.data.Dataset:
augmentation_func = partial(
augment_image,
augmentation_pipeline=augmentation_pipeline,
)
dataset = image_dataset_from_directory(
dataset_path,
subset=subset_type,
class_names=class_names,
validation_split=validation_fraction,
image_size=image_size,
batch_size=batch_size,
seed=seed,
)
return dataset \
.map(augmentation_func, num_parallel_calls=AUTOTUNE) \
.prefetch(AUTOTUNE)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (3 by maintainers)
Commits related to this issue
- Added debug information in order to provide more data for albumentations-team/albumentations#906 issue — committed to roma-glushko/rock-paper-scissors by roma-glushko 3 years ago
Looks good. I think current differences associated with the instability of algorithms and hardware.
Hmm. All of a sudden, this issue starts looking more interesting than at the beginning.
Чт, 27 мая 2021 г. в 11:57, Roman Glushko @.***>: