gluonts: UnboundLocalError: local variable 'lv' referenced before assignment
Description
I cannot include a validation dataset during the training of a multivariate model. When training without validation dataset everything works.
To Reproduce
(Please provide minimal example of code snippet that reproduces the error. For existing examples, please provide link.)
from gluonts.dataset.common import TrainDatasets
from gluonts.dataset.multivariate_grouper import MultivariateGrouper
from gluonts.dataset.repository.datasets import get_dataset
from gluonts.model.gpvar import GPVAREstimator
from gluonts.mx.trainer import Trainer
NUM_OF_SERIES = 8
def load_multivariate_dataset(dataset_name: str):
ds = get_dataset(dataset_name)
grouper_train = MultivariateGrouper(max_target_dim=NUM_OF_SERIES)
grouper_test = MultivariateGrouper(max_target_dim=NUM_OF_SERIES)
return TrainDatasets(
metadata=ds.metadata,
train=grouper_train(ds.train),
test=grouper_test(ds.test),
)
dataset = load_multivariate_dataset(
dataset_name="exchange_rate"
)
metadata = dataset.metadata
estimator = GPVAREstimator(
prediction_length=metadata.prediction_length,
target_dim=NUM_OF_SERIES,
freq=metadata.freq,
trainer=Trainer(
epochs=50,
batch_size=4,
num_batches_per_epoch=10,
patience=5,
)
)
predictor = estimator.train(
training_data=dataset.train, validation_data=dataset.test)
Error message or code output
UnboundLocalError: local variable 'lv' referenced before assignment
Environment
- Operating system: Mac OSX 10.15.5
- Python version: 3.7.6
- GluonTS version: 0.5.0
- MXNet version: 1.6.0
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 6
- Comments: 16 (10 by maintainers)
@DayanSiddiquiNXD, this would be a way to cut a validation dataset which does not overlap with the train dataset:
Note, that m4_hourly is a square dataset (all time series have the same length). If you do not have a square dataset, you can use the
DateSplitterinstead.In case you have multiple time series, another option you have is splitting the dataset horizontally, meaning that you reserve some time series for validation. You could do it like this, assuming your time series are well shuffled:
@lostella For now, why can’t we just do:
This would also make @kaijennissen’s code run and I can’t think of any disadvantage of this approach compared to how the validation loader is currently defined.
This does not take away from the need to rethink validation, since the validation mechanism applied by using
is not in line with the users’ expectation.
The problem occurs also on
master. The root cause seems appears to be the wayValidationDataLoaderextracts batches of data: the data transformation pipeline gets applied withis_train=True(see here) so that the “future” target is included in the data, and the loss associated with it (say, negative log-likelihood) can be computed; however, this also has the consequence that the instance splitter selects a random number of time windows from the validation dataset.Now, because of the usage of
MultivariateGrouper, the validation dataset consists of a single 8-dimensional series, out of which 0 or more “validation” windows gets sampled by the instance sampler. When 0 are sampled, no validation loss is computed andlvnever gets assigned.In fact, the following minimal, univariate example shows the number of elements produced by the ValidationDataLoader to be sometimes 0, sometimes 1.
This problem will occur any time the ValidationDataLoader is constructed out of a “singleton” dataset (but may happen with some other small number of time series in the dataset).
The solution to this is to rethink how the ValidationDataLoader should behave and how it is defined: not an easy one, but thank you kindly for submitting this @kaijennissen, this motivates us in improving this part.
@DayanSiddiquiNXD #555
Thanks. Replacing the ValidationDataLoader inside the GluonEstimator with the one you proposed seems to fix the issue.