sagemaker-python-sdk: Tensorboard not displaying scalars

When the flag run_tensorboard_locally is set to True , for example estimator.fit(inputs, run_tensorboard_locally=True), where estimator = TensorFlow(..) , Tensorboard only displays the graph and projector but not any scalars or images.

If one run is terminated and a new one is started by running again: estimator.fit(inputs, run_tensorboard_locally=True) then the scalars and images of the previous run are displayed on Tensorboard but they are not updated as training continues. It seems like it, when training is restarted, Tensorboard loads the previously saved logs from the /tmp/<temp_folder>/ , which was created by tempfile.mkdtemp(), but the new logs are then saved to a newly created folder.

Any way to get Tensorboard working properly? Would it make sense to add the ability to define logdir for Tensorboard when calling TensorFlow?

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 18 (10 by maintainers)

Commits related to this issue

Pass estimator class name as hyperparameter for tuning jobs (#26) — committed to laurenyu/sagemaker-python-sdk by ChoiByungWook 6 years ago
Merge pull request #26 from awslabs/mvs-tensorboard tensorboard example notebook — committed to apacker/sagemaker-python-sdk by mvsusp 7 years ago
feat: Master feature store ada r6 rebase r4 rebase master ada (#940) Co-authored-by: Alex Tang <tangalex@amazon.com> Co-authored-by: Can Sun <sucan@amazon.com> Co-authored-by: cansun <80425164+can-... — committed to jiapinw/sagemaker-python-sdk by suryans-commit a year ago

Most upvoted comments

@jbencook Thank you so much for your contribution!

Until the next SDK release, the Tensorboard fix can be viewed by building and installing from master.

It is also possible to view the fix within a SageMaker notebook instance by building and installing from source.

Start a new conda_tensorflow_p27 notebook
Clone from master and pip install within the cell

! git clone https://github.com/aws/sagemaker-python-sdk.git python-sdk-tensorboard-fix && cd python-sdk-tensorboard-fix && pip install . --upgrade

Run the cell

All tensorflow jobs that run tensorboard should now correctly display scalars!

Feel free to run the sample tensorboard notebook, tensorflow_resnet_cifar10_with_tensorboard , which is in /sample-notebooks/sagemaker-python-sdk.

Thanks again!

ChoiByungWook on Mar 23, 2018

Just checking in to see if there are any updates or any indication of when this will be fixed.

hsakkout on Mar 19, 2018

Thanks @vassiasim. I will reproduce the issue and come back with further information.

mvsusp on Dec 21, 2017