tensorflow: Tensorboard cannot load more than two event file in logdir

System information

Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Yes, Custom network structure and data pre-processing for my own task and dataset, modified based on current single GPU CIFAR-10 tutorial (which use the monitored session).
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 Pro 1703
TensorFlow installed from (source or binary): Binary, install locally by using pip install .\xxx.whl in my miniconda environment, for the environment, pip freeze give the following information

appdirs==1.4.3
bleach==1.5.0
cycler==0.10.0
html5lib==0.9999999
Markdown==2.2.0
matplotlib==2.0.0
numpy==1.12.1
olefile==0.44
packaging==16.8
Pillow==4.1.0
protobuf==3.2.0
pyparsing==2.2.0
python-dateutil==2.6.0
pytz==2017.2
six==1.10.0
tensorflow-gpu==1.1.0rc2
Werkzeug==0.12.1

TensorFlow version (use command below): Nightly build #149 (GPU Version), 1.1.0-rc2
CUDA/cuDNN version: CUDA 8.0, cuDNN 5.1
GPU model and memory: Quadro M1200, 4GB, WDDM mode

Describe the problem

When restart the training (due to some hyper parameter adjustment) the third time, Tensorboard cannot load the new event file. It can only load the first two event file and after that scalar will stop refreshing.

Powershell console gave the following output:

[tensor] PS D:\Workspace\ConsorFlow> tensorboard.exe --logdir '../input_data/lpr_train_exp_01'
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
Starting TensorBoard b'52' at http://DESKTOP-P7T44AT:6006
(Press CTRL+C to quit)
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
ERROR:tensorflow:Unable to get size of D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT: D:\Workspace\input_data\lpr_train_exp_01\events.out.tfevents.1493274079.DESKTOP-P7T44AT
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Found more than one graph event per run, or there was a metagraph containing a graph_def, as well as one or more graph events.  Overwriting the graph with the newest event.
WARNING:tensorflow:Found more than one metagraph event per run. Overwriting the metagraph with the newest event.
WARNING:tensorflow:Detected out of order event.step likely caused by a TensorFlow restart. Purging expired events from Tensorboard display between the previous step: -1 (timestamp: -1) and current step: 17454 (timestamp: 1493310366.6493406). Removing 174 scalars, 76 histograms, 76 compressed histograms, 451 images, and 0 audio.

The ‘current step’ 17454 in the output is the first step in my second restart.

Information about event files: 1st: events.out.tfevents.1493274079 2nd: events.out.tfevents.1493310339 3rd: events.out.tfevents.1493352650

About this problem in Ubuntu: I just switch to windows several days ago, such problem did not exist in Ubuntu (at least 14.04). I was using the exact same script, but with tensorflow version 1.01 (GPU, not nightly version), install following the offical instruction.

Under windows, it was because of #7500, which leave me no choice but to install a nightly build.

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 27 (4 by maintainers)

Commits related to this issue

Added train_metrics_dir FLAG. Used for TraceHook. This prevents a TensorBoard bug, see: https://github.com/tensorflow/tensorflow/issues/9512 — committed to mdangschat/ctc-asr by mdangschat 6 years ago
Save tensorboard logs in separate folders Issue: https://github.com/tensorflow/tensorflow/issues/9512 — committed to nokutu/m3-project by oscmansan 5 years ago

Most upvoted comments

This is a known issue, TensorBoard doesn’t like it when you write multiple event files from separate runs in the same directory. It will be fixed if you use a new subdirectory for every run (new hyperparameters = new subdirectory).

+25

teamdandelion on Jun 16, 2017

I see same problem, why was this closed?

+17

boulderZ on Jun 24, 2018

This issues comes up with estimators as well, which write to an eval subdirectory of model_dir.

I think this should be reopened 😃

+17

gar1t on Mar 10, 2018

@dandelionmane any plans to fix this ?

With this issue, one must maintain a single writer per run, which isn’t possible when, for example, using Keras’s TensorBoard callback along with other custom image/audio tensorboard callbacks. The other option is to have each writer in a separate directory but then Tensorboard will show them as separate runs.

mohamed-ezz on Dec 28, 2017

If possible, it would also be good to know where this happened (run directory etc.), exactly what data triggered this warning.

tpet on Mar 11, 2018