sagemaker-python-sdk: Tensorboard not displaying scalars

When the flag run_tensorboard_locally is set to True , for example estimator.fit(inputs, run_tensorboard_locally=True), where estimator = TensorFlow(..) , Tensorboard only displays the graph and projector but not any scalars or images.

If one run is terminated and a new one is started by running again: estimator.fit(inputs, run_tensorboard_locally=True) then the scalars and images of the previous run are displayed on Tensorboard but they are not updated as training continues. It seems like it, when training is restarted, Tensorboard loads the previously saved logs from the /tmp/<temp_folder>/ , which was created by tempfile.mkdtemp(), but the new logs are then saved to a newly created folder.

Any way to get Tensorboard working properly? Would it make sense to add the ability to define logdir for Tensorboard when calling TensorFlow?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

@jbencook Thank you so much for your contribution!

Until the next SDK release, the Tensorboard fix can be viewed by building and installing from master.

It is also possible to view the fix within a SageMaker notebook instance by building and installing from source.

  1. Start a new conda_tensorflow_p27 notebook
  2. Clone from master and pip install within the cell
! git clone https://github.com/aws/sagemaker-python-sdk.git python-sdk-tensorboard-fix && cd python-sdk-tensorboard-fix && pip install . --upgrade
  1. Run the cell

All tensorflow jobs that run tensorboard should now correctly display scalars!

Feel free to run the sample tensorboard notebook, tensorflow_resnet_cifar10_with_tensorboard , which is in /sample-notebooks/sagemaker-python-sdk.

Thanks again!

Just checking in to see if there are any updates or any indication of when this will be fixed.

Thanks @vassiasim. I will reproduce the issue and come back with further information.