tensorflow: TensorBoard callback without profile_batch setting cause Errors: CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER
Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template
System information
- Have I written custom code (as opposed to using a stock example script provided in TensorFlow): Stateless LSTM from Keras tutorial using tf backend
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10
- TensorFlow installed from (source or binary): binary
- TensorFlow version (use command below): 2.1.0
- Python version: 3.7.4
- CUDA/cuDNN version: 10.1
- GPU model and memory: MX150 10GB
Describe the current behavior When using tf.keras.callbacks.TensorBoard() without the profile_batch setting, it gives out errors of CUPTI_ERROR_INSUFFICIENT_PRIVILEGES and CUPTI_ERROR_INVALID_PARAMETER from tensorflow/core/profiler/internal/gpu/cupti_tracer.cc.
Describe the expected behavior With profile_batch = 0, these two errors are gone. But comes back when profile_batch = 1, or other non-zero values.
Code to reproduce the issue
from __future__ import print_function
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
input_len = 1000
tsteps = 2
lahead = 1
batch_size = 1
epochs = 5
print("*" * 33)
if lahead >= tsteps:
print("STATELESS LSTM WILL ALSO CONVERGE")
else:
print("STATELESS LSTM WILL NOT CONVERGE")
print("*" * 33)
np.random.seed(1986)
print('Generating Data...')
def gen_uniform_amp(amp=1, xn=10000):
data_input = np.random.uniform(-1 * amp, +1 * amp, xn)
data_input = pd.DataFrame(data_input)
return data_input
to_drop = max(tsteps - 1, lahead - 1)
data_input = gen_uniform_amp(amp=0.1, xn=input_len + to_drop)
expected_output = data_input.rolling(window=tsteps, center=False).mean()
if lahead > 1:
data_input = np.repeat(data_input.values, repeats=lahead, axis=1)
data_input = pd.DataFrame(data_input)
for i, c in enumerate(data_input.columns):
data_input[c] = data_input[c].shift(i)
expected_output = expected_output[to_drop:]
data_input = data_input[to_drop:]
def create_model(stateful):
model = Sequential()
model.add(LSTM(20,
input_shape=(lahead, 1),
batch_size=batch_size,
stateful=stateful))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam')
return model
print('Creating Stateful Model...')
model_stateful = create_model(stateful=True)
def split_data(x, y, ratio=0.8):
to_train = int(input_len * ratio)
to_train -= to_train % batch_size
x_train = x[:to_train]
y_train = y[:to_train]
x_test = x[to_train:]
y_test = y[to_train:]
# tweak to match with batch_size
to_drop = x.shape[0] % batch_size
if to_drop > 0:
x_test = x_test[:-1 * to_drop]
y_test = y_test[:-1 * to_drop]
# some reshaping
reshape_3 = lambda x: x.values.reshape((x.shape[0], x.shape[1], 1))
x_train = reshape_3(x_train)
x_test = reshape_3(x_test)
reshape_2 = lambda x: x.values.reshape((x.shape[0], 1))
y_train = reshape_2(y_train)
y_test = reshape_2(y_test)
return (x_train, y_train), (x_test, y_test)
(x_train, y_train), (x_test, y_test) = split_data(data_input, expected_output)
print('x_train.shape: ', x_train.shape)
print('y_train.shape: ', y_train.shape)
print('x_test.shape: ', x_test.shape)
print('y_test.shape: ', y_test.shape)
print('Creating Stateless Model...')
model_stateless = create_model(stateful=False)
import os
import datetime
ROOT_DIR = os.getcwd()
log_dir = os.path.join('callback_tests')
if not os.path.exists(log_dir):
os.makedirs(log_dir)
print(log_dir)
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir)
print('Training')
history = model_stateless.fit(x_train,
y_train,
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(x_test, y_test),
shuffle=False,
callbacks=[tensorboard_callback]
)
Other info / logs Train on 800 samples, validate on 200 samples 2020-01-14 21:30:27.591905: I tensorflow/core/profiler/lib/profiler_session.cc:225] Profiler session started. 2020-01-14 21:30:27.594743: I tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1259] Profiler found 1 GPUs 2020-01-14 21:30:27.599172: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cupti64_101.dll 2020-01-14 21:30:27.704083: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1307] function cupti_interface_->Subscribe( &subscriber_, (CUpti_CallbackFunc)ApiCallback, this)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES 2020-01-14 21:30:27.716790: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1346] function cupti_interface_->ActivityRegisterCallbacks( AllocCuptiActivityBuffer, FreeCuptiActivityBuffer)failed with error CUPTI_ERROR_INSUFFICIENT_PRIVILEGES Epoch 1/5 2020-01-14 21:30:28.370429: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cublas64_10.dll 2020-01-14 21:30:28.651767: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library cudnn64_7.dll 2020-01-14 21:30:29.662864: E tensorflow/core/profiler/internal/gpu/cupti_tracer.cc:1329] function cupti_interface_->EnableCallback( 0 , subscriber_, CUPTI_CB_DOMAIN_DRIVER_API, cbid)failed with error CUPTI_ERROR_INVALID_PARAMETER 2020-01-14 21:30:29.670282: I tensorflow/core/profiler/internal/gpu/device_tracer.cc:88] GpuTracer has collected 0 callback api events and 0 activity events. 800/800 [==============================] - 5s 6ms/sample - loss: 0.0011 - val_loss: 0.0011 Epoch 2/5 800/800 [==============================] - 3s 4ms/sample - loss: 8.5921e-04 - val_loss: 0.0010 Epoch 3/5 800/800 [==============================] - 3s 3ms/sample - loss: 8.5613e-04 - val_loss: 0.0010 Epoch 4/5 800/800 [==============================] - 3s 4ms/sample - loss: 8.5458e-04 - val_loss: 9.9713e-04 Epoch 5/5 800/800 [==============================] - 3s 4ms/sample - loss: 8.5345e-04 - val_loss: 9.8825e-04
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 23
- Comments: 56 (6 by maintainers)
Adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"to /etc/modprobe.d/nvidia-kernel-common.conf and reboot should resolve the permision issue.This solved the issue for me: Right-click on your desktop desktop for quick access to the NVIDIA Control Panel Windows Step 1: Open the NVIDIA Control Panel, select ‘Desktop’, and ensure ‘Enable Developer Settings’ is checked. Windows Step 2: Under ‘Developer’ > ‘Manage GPU Performance Counters’, select ‘Allow access to the GPU performance counter to all users’ to enable unrestricted profiling[1]
@tamaramiteva I resolved the problem with docker run option ‘–privileged=true’. There is no more errors such as ‘CUPTI_ERROR_INSUFFICIENT_PRIVILEGES’.
For people using Docker on Linux: Instead of running the container with
--privileged=truejust pass--cap-add=CAP_SYS_ADMIN.Anyone with the same issue on Windows 10? The two offered solutions only work for Linux.
Hey @trisolaran thanks for brief intro. The thing is i do not have /etc/modprobe.d/nvidia-kernel-common.conf such file. I am using a conda environment.
Can confirm that the following fixed the problem for me. It’s possible that only a subset is strictly necessary.
(note: Docker config, using official 2.2.0 image)
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"to/etc/modprobe.d/nvidia-kernel-common.confand runningupdate-initramfs -uexport CUDA_VERSION="10.1",export LD_LIBRARY_PATH="/usr/local/cuda-${CUDA_VERSION}/lib64:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/lib64andexport LD_INCLUDE_PATH="/usr/local/cuda-${CUDA_VERSION}/include:/usr/local/cuda-${CUDA_VERSION}/extras/CUPTI/include"to the host machine’s.zshrcENV LD_INCLUDE_PATH="/usr/local/cuda/include:/usr/local/cuda/extras/CUPTI/include:$LD_INCLUDE_PATHto the Dockerfile--privileged@gawain-git-code , I tried running the code in colab and I was able to run it successfully. please find the gist for reference.Thanks!
In order to run docker:
nvidia-docker run '--privileged=true' -d -it --name retina_net -v /home/readib/Experiments/:/home -p 8000:8888 -v /tmp/.X11-unix/:/tmp/.X11-unix -e DISPLAY=$DISPLAY retina_net:latest /bin/bashYes, I get that segfault too – I think it’s because the overhead of profiling, on top of regular GPU computations, causes GPU memory overflow.
@SarfarazHabib Hi I am using a conda enviroment too and I solved this problem by adding
options nvidia "NVreg_RestrictProfilingToAdminUsers=0"to/etc/modprobe.d/nvidia-kernel-common.conf. I had not had the file too , so you should make the file.I am having the same error in anaconda environment. None of the solutions posted above work for me. Does anyone have any ideas what can be done? Also what does this error actually mean, if someone is kind enough to explain it to a noob ?
This seems to be a global setting, so it works with monkey-patched Anaconda environments.