tensorflow: Tensorboard cannot load more than two event file in logdir
System information
-
Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, Custom network structure and data pre-processing for my own task and dataset, modified based on current single GPU CIFAR-10 tutorial (which use the monitored session).
-
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro 1703
-
TensorFlow installed from (source or binary): Binary, install locally by using
pip install .\xxx.whlin my miniconda environment, for the environment,pip freezegive the following information
appdirs==1.4.3
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==2.2.0
matplotlib==2.0.0
numpy==1.12.1
olefile==0.44
packaging==16.8
Pillow==4.1.0
protobuf==3.2.0
pyparsing==2.2.0
python-dateutil==2.6.0
pytz==2017.2
six==1.10.0
tensorflow-gpu==1.1.0rc2
Werkzeug==0.12.1
-
TensorFlow version (use command below): Nightly build #149 (GPU Version), 1.1.0-rc2
-
CUDA/cuDNN version: CUDA 8.0, cuDNN 5.1
-
GPU model and memory: Quadro M1200, 4GB, WDDM mode
Describe the problem
When restart the training (due to some hyper parameter adjustment) the third time, Tensorboard cannot load the new event file. It can only load the first two event file and after that scalar will stop refreshing.
Powershell console gave the following output:
[tensor] PS D:\Workspace\ConsorFlow> tensorboard.exe --logdir '../input_data/lpr_train_exp_01'
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event.
Starting TensorBoard b'52' at http://DESKTOP-P7T44AT:6006
(Press CTRL+C to quit)
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
ERROR:tensorflow:Unable to get size of D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT: D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events. Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Detected out of order event.step likely caused by a TensorFlow restart. Purging expired events from Tensorboard display between the previous step: -1 (timestamp: -1) and current step: 17454 (timestamp: 1493310366.6493406). Removing 174 scalars, 76 histograms, 76 compressed histograms, 451 images, and 0 audio.
The ‘current step’ 17454 in the output is the first step in my second restart.
Information about event files: 1st: events.out.tfevents.1493274079 2nd: events.out.tfevents.1493310339 3rd: events.out.tfevents.1493352650
About this problem in Ubuntu: I just switch to windows several days ago, such problem did not exist in Ubuntu (at least 14.04). I was using the exact same script, but with tensorflow version 1.01 (GPU, not nightly version), install following the offical instruction.
Under windows, it was because of #7500, which leave me no choice but to install a nightly build.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 27 (4 by maintainers)
Commits related to this issue
- Added train_metrics_dir FLAG. Used for TraceHook. This prevents a TensorBoard bug, see: https://github.com/tensorflow/tensorflow/issues/9512 — committed to mdangschat/ctc-asr by mdangschat 6 years ago
- Save tensorboard logs in separate folders Issue: https://github.com/tensorflow/tensorflow/issues/9512 — committed to nokutu/m3-project by oscmansan 5 years ago
This is a known issue, TensorBoard doesn’t like it when you write multiple event files from separate runs in the same directory. It will be fixed if you use a new subdirectory for every run (new hyperparameters = new subdirectory).
I see same problem, why was this closed?
This issues comes up with estimators as well, which write to an
evalsubdirectory ofmodel_dir.I think this should be reopened 😃
@dandelionmane any plans to fix this ?
With this issue, one must maintain a single writer per run, which isn’t possible when, for example, using Keras’s
TensorBoardcallback along with other custom image/audio tensorboard callbacks. The other option is to have each writer in a separate directory but then Tensorboard will show them as separate runs.If possible, it would also be good to know where this happened (run directory etc.), exactly what data triggered this warning.