tensorflow: Loading Tensorflow models from disk is instant while loading from gcs is very slow
System information
- OS Platform and Distribution: Both MacOS and Ubuntu 18 (Tensorflow/serving)
- TensorFlow version (use command below): 2.0.0
- Python version: 3.6
Describe the current behavior
I want to use Tensorflow serving to load multiple Models. If I mount a directory containing the model, loading everything is done in an instant, while loading them from a gs:// path takes around 10 seconds per model.
While researching the issue I discovered this is probably a Tensorflow issue and not a Tensorflow Serving issue as loading them in Tensorflow is a huge difference as well:
[ins] In [22]: %timeit tf.saved_model.load('test/1')
3.88 s ± 719 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
[ins] In [23]: %timeit tf.saved_model.load('gs://path/to/test/1')
30.6 s ± 2.66 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
Then it could be that downloading the model (which is very small) is slow, but I tested this as well:
def test_load():
bucket_name = 'path'
folder='test'
delimiter='/'
file = 'to/test/1'
bucket=storage.Client().get_bucket(bucket_name)
blobs=bucket.list_blobs(prefix=file, delimiter=delimiter)
for blob in blobs:
print(blob.name)
destination_uri = '{}/{}'.format(folder, blob.name)
blob.download_to_filename(destination_uri)
[ins] In [31]: %timeit test_load()
541 ms ± 54.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Any idea what is happening here?
Describe the expected behavior Around the same load time for external vs local models. The first load from gs can be slow if the auth needs to happen, but even authenticating is still way faster than the models loads.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 25 (3 by maintainers)
@mihaimaruseac Thanks for investigating. I created a quick profile with TF 2.4 and most of the time is actually spent in the following function call mainly due to the execution of the
restore_v2operation, so I am not sure if the root cause is actually related to the TF file system implementation: https://github.com/tensorflow/tensorflow/blob/7a1b65b61397dbad1d400e1d3c7079246dfe309a/tensorflow/python/saved_model/load.py#L161Having a same issue here, except that I’m loading a big SavedModel and it’s extremely slow. Downloading the same model with gsutil is much faster so it rules out the problem of the network.
The following code is used for the plot
Mind the log scales in the plot