jax: Viewing program traces with Perfetto: `ValueError: Invalid trace folder`
Description
Following https://jax.readthedocs.io/en/latest/profiling.html:
import jax
with jax.profiler.trace("/tmp/jax-trace", create_perfetto_link=True):
# Run the operations to be profiled
key = jax.random.PRNGKey(0)
x = jax.random.normal(key, (5000, 5000))
y = x @ x
y.block_until_ready()
Output:
2022-10-27 23:56:51.130358: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.192326: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.762358: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.762440: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.762446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
2022-10-27 23:56:51.839141: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer.so.7'; dlerror: libnvinfer.so.7: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.839247: W tensorflow/compiler/xla/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libnvinfer_plugin.so.7'; dlerror: libnvinfer_plugin.so.7: cannot open shared object file: No such file or directory
2022-10-27 23:56:51.839264: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
File "/nfs_share/bart-base-jax/2.py", line 3, in <module>
with jax.profiler.trace("/tmp/jax-trace", create_perfetto_link=True):
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 236, in trace
stop_trace()
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 197, in stop_trace
abs_filename = _write_perfetto_trace_file(_profile_state.log_dir)
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 134, in _write_perfetto_trace_file
raise ValueError(f"Invalid trace folder: {latest_folder}")
ValueError: Invalid trace folder: /tmp/jax-trace/plugins/profile/2022_10_27_23_56_58
Traceback (most recent call last):
File "/nfs_share/bart-base-jax/2.py", line 3, in <module>
with jax.profiler.trace("/tmp/jax-trace", create_perfetto_link=True):
File "/usr/lib/python3.10/contextlib.py", line 142, in __exit__
next(self.gen)
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 236, in trace
stop_trace()
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 197, in stop_trace
abs_filename = _write_perfetto_trace_file(_profile_state.log_dir)
File "/home/ayaka/.venv310/lib/python3.10/site-packages/jax/_src/profiler.py", line 134, in _write_perfetto_trace_file
raise ValueError(f"Invalid trace folder: {latest_folder}")
ValueError: Invalid trace folder: /tmp/jax-trace/plugins/profile/2022_10_27_23_57_00
What jax/jaxlib version are you using?
jax v0.3.23, jaxlib v0.3.22, tensorflow v2.11.0rc1 (compatible with jaxlib)
Which accelerator(s) are you using?
TPU v4-16
Additional system info
Python 3.10.8, Linux 5.8.0-1035-gcp
NVIDIA GPU info
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 15
- Comments: 15 (8 by maintainers)
Apologies for the lack of updates! Both Parker and I have been on largely nonoverlapping vacation for the last month or so.
I’m back next week so will hopefully have something for you then.
I’m hitting this problem as well. I’m not sure if this is helpful, but when I look in the “invalid trace folder” I see only a single
.pb
file (and notrace.json
)We are currently working on automatically parsing the xplane.pb file and uploading it to Perfetto. Will update this thread when it’s done! We have a Thanksgiving holiday next week so hopefully we’ll have something to show the week after (cc: @pschuh)
+1
It’s back! https://github.com/tensorflow/tensorflow/commit/b1dfc9285409bd9cb07f4598737450773daec573
We should be cutting a release soon so I will update the thread when that’s out
Ah sorry forgot to update the thread. I think it should work with the latest Jax.
A quick update: @pschuh has made progress on reviving the old code that generated the
trace.json.gz
that was uploaded to Perfetto. Once that lands, and we cut a jaxlib release, Perfetto should work again!Running into the same error. With perfetto flags set to False, the program executes. A single file is created
...xplane.pb
However, Tensorboard does not recognize the created file.Running
tensorboard --inspect --event_file=plugins/profile/2023_01_22_17_22_49/taurusi8017.xplane.pb
yields the following output, which seems like the files are emptyUsing
+1