tensorflow: Segmentation fault when using cpp custom op in tf.data.Dataset.map in tensorflow2.0

It seems if I have cpp custom op in a python function and I pass the python function to tf.data.Dataset.map it will crush. If I only call this python function outside, It will be ok. I’ve spend a whole afternoon to find the bug. I’m really mad about this bug.

Have I written custom code (as opposed to using a stock example script provided in TensorFlow):yes OS Platform and Distribution (e.g., Linux Ubuntu 16.04): linux ubuntu 18.04 TensorFlow installed from (source or binary):binary TensorFlow version (use command below): 2.0b1 Python version:3.6 CUDA/cuDNN version:10/7.4 GPU model and memory:7.5/24gb


import tensorflow as tf
import pdb
extr_module = tf.load_op_library('./build/libextr_module.so')
res = extr_module.test_bug() # ok

def aaa(filename):
    res = extr_module.test_bug() # Segmentation fault (core dumped)

    return tf.zeros([1], tf.float32)
    
dataset = tf.data.TextLineDataset(['aaa']).map(aaa)

#include "tensorflow/core/framework/op_kernel.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/tensor.h"
#include "tensorflow/core/framework/tensor_shape.h"
#include "tensorflow/core/framework/register_types.h"
#include "tensorflow/core/framework/op.h"
#include "tensorflow/core/framework/shape_inference.h"
#include "tensorflow/core/util/work_sharder.h"

#include <iostream>
#include <cmath>

using namespace tensorflow;

REGISTER_OP("TestBug")
    .Output("dummy: float")
    .SetShapeFn([](::tensorflow::shape_inference::InferenceContext* c) {
        c->set_output(0, c->MakeShape({1}));
        return Status::OK();
    });

class TestBugOp : public OpKernel
{
public:
explicit TestBugOp(OpKernelConstruction* context)
        : OpKernel(context)
{

}

void Compute(OpKernelContext* context) override
{
    Tensor* dummy = NULL;
    OP_REQUIRES_OK(context, context->allocate_output(0, {1},
                                                     &dummy));
}
};

REGISTER_KERNEL_BUILDER(
  Name("TestBug").Device(DEVICE_CPU),
  TestBugOp
);

CMAKE_MINIMUM_REQUIRED(VERSION 2.8)
PROJECT(extr_module)


# compiler flags
SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11 -O2 ${OpenMP_CXX_FLAGS} -Wall -fPIC -D_GLIBCXX_USE_CXX11_ABI=0 -DGOOGLE_CUDA=1")

# TensorFlow dependencies
EXECUTE_PROCESS(COMMAND python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL']='3'; import tensorflow as tf; print(tf.sysconfig.get_include(), end='', flush=True)"  OUTPUT_VARIABLE TF_INC)

EXECUTE_PROCESS(COMMAND python3 -c "import os; os.environ['TF_CPP_MIN_LOG_LEVEL']='3'; import tensorflow as tf; print(tf.sysconfig.get_lib(), end='', flush=True)"  OUTPUT_VARIABLE TF_LIB)


MESSAGE(STATUS "Found TF_INC: " ${TF_INC})
#MESSAGE(STATUS "Found TF_INC_EXTERNAL: " ${TF_INC}/external/nsync/public)
MESSAGE(STATUS "Found TF_LIB: " ${TF_LIB})


INCLUDE_DIRECTORIES(${TF_INC})
#INCLUDE_DIRECTORIES(${TF_INC}/external/nsync/public)
LINK_DIRECTORIES(${TF_LIB})


ADD_LIBRARY(extr_module SHARED
  testbug.cc
)

TARGET_LINK_LIBRARIES(extr_module tensorflow_framework)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 30 (17 by maintainers)

Most upvoted comments

problem solved!!! I use g+±4.8 instead!

The problem with these compiler incompatibilities is they’re basically a game of russian roulette. You never know when they hit you.

The solution to downgrade compilers seems a bit crazy and is very undocumented. While the nod to the incompatible ABI exists in the documentation, it is not at all clear on all the caveats that need to be dealt with.

For example, in our flow, we use c++14 stuff (constexpr) in our custom op implementation which worked swimmingly in tf 1.13 and g++7. Now, because of changes made and some possibly naughty out of scope allocation (potentially), we are being forced to use compilers that produce “more appropriate” code. This all seems sort of crazy to me. Without determinism, the programmer is surely destined for insanity.

Do we know if the fix for this according to the tf gods has been decreed to be “Follow our build flow”?