tensorflow: Failing on a in tensorflow_cc.so on Windows 7 on Quadro R5000 16Gb with v1.12 and CUDA 10.0.130 and CUDNN 7.4.2.24 OK under Windows 10 Quadro P5000 and GTX 1060 6Gb

Please make sure that this is a bug. As per our GitHub Policy, we only address code/doc bugs, performance issues, feature requests and build/installation issues on GitHub. tag:bug_template

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow): I have linked against the //tensorflow:libtensorflow_cc.so and //tensorflow:libtensorflow_framework.so targets using other libs, abseil-cpp, libprotobuf etc
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Windows 10 (build) and Window 7 (deployment)
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device: N/A
  • TensorFlow installed from (source or binary): source
  • TensorFlow version (use command below): v1.12.0 (tensorflow-cuda10) C:\Users\user\dev\tensorflow-cuda10\tensorflow\tensorflow\core\common_runtime\gpu>python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)” b’v1.12.0-0-ga6d8ffae09’ 1.12.0
  • Python version: 3.6 (N/A)
  • Bazel version (if compiling from source): 0.19.2
  • GCC/Compiler version (if compiling from source):MSVC 14.0
  • CUDA/cuDNN version: 10.0.130, 7.4.2.24
  • GPU model and memory: GTX 1060 6Gb and Quadro R5000 16Gb

You can collect some of this information using our environment capture script You can also obtain the TensorFlow version with python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)”

(tensorflow-cuda10) C:\Users\user\dev\tensorflow-cuda10\tensorflow\tensorflow\core\common_runtime\gpu>python -c “import tensorflow as tf; print(tf.GIT_VERSION, tf.VERSION)” b’v1.12.0-0-ga6d8ffae09’ 1.12.0

Describe the current behavior The application is currently crashed when initialising the session on the Quadro card on the client’s computer running Windows 7 with the error messsage:

2019-04-02 11:30:18.871580: F tensorflow/core/common_runtime/gpu/gpu_event_mgr.cc:274] Unexpected Event status: 1

here is the code for that file, where LOG FATAL is line 274

// This function must be called periodically to check whether pending
// events have recorded, and then retire them.  Initial observations
// suggest that typical behavior in a TensorFlow program is to have
// 0-3 events pending most of the time, but there are occasionally
// spikes of up to several hundred outstanding.
//
// NOTE: If all events are on the same stream, no later event will
// complete before an earlier event, except possibly if the earlier
// event transitions to an error state, so there's no advantage in
// looking past the first kPending event.  However, if we're using
// multiple streams there may be some gain in looking deeper.
// As a compromise, PollEvent() calls that are triggered by the queueing
// of a single event never look past the first kPending event.  Calls
// coming from the dedicated polling thread always sweep the full queue.
//
// Note that allowing the queue to grow very long could cause overall
// GPU memory use to spike needlessly.  An alternative strategy would
// be to throttle new Op execution until the pending event queue
// clears.
void EventMgr::PollEvents(bool is_dedicated_poller,
                          gtl::InlinedVector<InUse, 4>* to_free) {
  VLOG(2) << "PollEvents  free_events_ " << free_events_.size()
          << " used_events_ " << used_events_.size();
  // Sweep the remaining events in order.  If this is the dedicated
  // polling thread, check the entire set.  Otherwise, just sweep up to
  // the first non-complete record that is still pending.
  for (auto& iu : used_events_) {
    if (iu.event == nullptr) continue;
    se::Event::Status s = iu.event->PollForStatus();
    switch (s) {
      case se::Event::Status::kUnknown:
      case se::Event::Status::kError:
        // We don't expect to see these.  Someday maybe propagate
        // a Status error, but for now fail hard.
        LOG(FATAL) << "Unexpected Event status: " << static_cast<int>(s);
        break;
      case se::Event::Status::kPending:
        if (!is_dedicated_poller) return;  // quit processing queue
        break;
      case se::Event::Status::kComplete:
        // Make a copy of the InUse record so we can free it after releasing
        // the lock
        to_free->push_back(iu);
        free_events_.push_back(iu.event);
        // Mark this InUse record as completed.
        iu.event = nullptr;
    }
  }
  // Then clear any completed InUse records from the front of the queue.
  while (!used_events_.empty()) {
    InUse& iu = used_events_.front();
    if (iu.event == nullptr) {
      used_events_.pop_front();
    } else {
      break;
    }
  }
}

}  // namespace tensorflow

Describe the expected behavior I would expect the software to load the graph into a fresh session and compute

Code to reproduce the issue Provide a reproducible test case that is the bare minimum necessary to generate the problem. tensorflow::SessionOptions options; tensorflow::ConfigProto* config = &options.config; options.config.mutable_gpu_options()->set_per_process_gpu_memory_fraction(0.9); device_count->insert({ “GPU”,1}); } device_count->insert({ “CPU”, 1 }); //bytes is read from graph_file_name graph_def->ParseFromArray(bytes.data(), (int)bytes.size())) session>reset(tensorflow::NewSession(options); std::cout << “Rotobot: Swapping to model: " << graph_file_name << " using a single model per render is more efficent” << std::endl; //crashes after here auto status = (*session)->Create(graph_def); auto status2 = (*session)->Run(Input_Tensors);

Other info / logs Include any logs or source code that would be helpful to diagnose the problem. If including tracebacks, please include the full traceback. Large logs and files should be attached.

You can download the built software from: https://kognat.com/product/rotobot-openfx-plugin-windows-64-gpu-v1-2-0-rc2-cuda-10/

You will just need an OpenFX host like Natron https://natrongithub.github.io/

This tutorial will give you reproduction steps https://kognat.com/2019/03/28/rotobot-srgb/

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 31 (3 by maintainers)

Most upvoted comments

Memory management in TF is the greatest cause of bugs in my application, any help would be useful for the entire community

I got a new error report

With error level 3 only

tfSession->Run failed: Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node xception_65/entry_flow/conv1_1/Conv2D}} = Conv2D[T=DT_FLOAT, data_format=“NCHW”, dilations=[1, 1, 1, 1], padding=“VALID”, strides=[1, 1, 2, 2], use_cudnn_on_gpu=true, _device=“/job:localhost/replica:0/task:0/device:GPU:0”](xception_65/entry_flow/conv1_1/Conv2D-0-TransposeNHWCToNCHW-LayoutOptimizer, xception_65/entry_flow/conv1_1/weights)]] [[{{node SemanticPredictions/_45}} = _Recvclient_terminated=false, recv_device=“/job:localhost/replica:0/task:0/device:CPU:0”, send_device=“/job:localhost/replica:0/task:0/device:GPU:0”, send_device_incarnation=1, tensor_name=“edge_2428_SemanticPredictions”, tensor_type=DT_INT64, _device=“/job:localhost/replica:0/task:0/device:CPU:0”]]