TensorFlow.NET: Segmentation fault in multithread app (v0.11.2)

App is crashing during sess.run with message Segmentation fault (core dumped) on docker or Attempted to read or write protected memory. This is often an indication that other memory is corrupt. on Windows.

Stack trace on Windows

   at Tensorflow.c_api.TF_SessionRun(IntPtr session, TF_Buffer* run_options, TF_Output[] inputs, IntPtr[] input_values, Int32 ninputs, TF_Output[] outputs, IntPtr[] output_values, Int32 noutputs, IntPtr[] target_opers, Int32 ntargets, IntPtr run_metadata, IntPtr status)
   at Tensorflow.BaseSession._call_tf_sessionrun(KeyValuePair`2[] feed_dict, TF_Output[] fetch_list, List`1 target_list)
   at Tensorflow.BaseSession._do_run(List`1 target_list, List`1 fetch_list, Dictionary`2 feed_dict)
   at Tensorflow.BaseSession._run(Object fetches, FeedItem[] feed_dict)
   at Tensorflow.BaseSession.run(Tensor fetche, FeedItem[] feed_dict)

I created small example project for tests: https://github.com/deadman2000/TensorFlowNetMultithreading

About this issue

Original URL
State: open
Created 5 years ago
Comments: 22 (11 by maintainers)

Commits related to this issue

MultithreadingTests.cs: Added unit-test for case #380 — committed to SciSharp/TensorFlow.NET by Nucs 5 years ago

Most upvoted comments

So I think issue is caused by the following usage of nd.GetData() in Tensor.Creation.cs. I guess that starts pointing to GC controlled memory with no guarantees it will stay at the same address after GC work.

        private unsafe IntPtr CreateTensorFromNDArray(NDArray nd, TF_DataType? given_dtype)
        {
            if (nd.typecode == NPTypeCode.String)
                throw new NotImplementedException("Support for NDArray of type string not implemented yet");

>>>         var arraySlice = nd.Unsafe.Storage.Shape.IsContiguous ? nd.GetData() : nd.CloneData();

Changing to the following helped (probably with some performance degradation which I didn’t notice due to small input dataset):

var arraySlice = nd.CloneData();

After this change I do not reproduce the crash anymore, but I will keep testing this.

tompetk on Apr 7, 2020

@Mghobadid fix by #533 should do the trick. Not the most efficient way, but seems to work at least on CPU (and I reproduced exactly same issue on GPU, so should be same)…

tompetk on Apr 17, 2020

After I’ll get my hands on a dump and research it. I’ll let you know.

Nucs on Sep 10, 2019

If you’ll need to do multi-threaded unit tests in the future, you are welcome to use MultiThreadedUnitTestExecuter I wrote for the library: https://github.com/SciSharp/TensorFlow.NET/blob/master/test/TensorFlowNET.UnitTest/Utilities/MultiThreadedUnitTestExecuter.cs

Usage:

MultiThreadedUnitTestExecuter.Run(threadCount: 8, worload: tid => ...);

Nucs on Sep 9, 2019