cudf: [BUG] libcudf must customize the Thrust/CUB namespace

Describe the bug

For header-only libraries like Thrust/CUB, it can be problematic when an application inadvertently ends up with multiple or mismatched versions of the headers.

Thrust/CUB provide a way to universally customize the namespace such that a symbol like thrust::reduce will instead be thrust::CUSTOM::reduce.

This prevents inadvertent symbol collisions and version mismatches by forcing libcudf to use Thrust/CUB symbols from within its custom namespace.

Solution

libcudf should set THRUST_CUB_WRAPPED_NAMESPACE=cudf as part of the cmake configuration step for Thrust/CUB.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 22 (18 by maintainers)

Commits related to this issue

Most upvoted comments

@bdice @allisonvacanti has just created 1.17.2 tag.

@robertmaynard, @jrhemstad, mentioned PRs are merged.

With a solution to NVIDIA/cub#545 being merged, I think the best option for cudf is to patch our version of thrust.

Can’t we just use a version of Thrust that makes the same changes? https://github.com/NVIDIA/thrust/issues/1788

(Thanks for the clarification @robertmaynard / @allisonvacanti! My previous message may have sounded negative on this idea - poor writing on my part.)

it thinks cudf::make_strings_column(cudf::device_span<thrust::pair ...) should exist but the signature is actually cudf::make_strings_column(cudf::device_span<rapids::thrust::pair ...).

The THRUST_NS_QUALIFIER macro may simplify things here:

cudf::make_strings_column(cudf::device_span<THRUST_NS_QUALIFIER::pair ...)

It expands to ::thrust by default, but rapids::thrust when THRUST_CUB_WRAPPED_NAMESPACE=rapids.

I suppose we want to remove Thrust APIs from our public interfaces so perhaps this is solved in that way.

I strongly support this. The namespace macros are only robust when Thrust isn’t exposed publicly.

However, it makes me wonder if this truly solves any problems (compared with no namespace) because some rmm types (for example) will not be compatible with non-RAPIDS libraries.

It does solve a real problem which is that end user programs bring RAPIDS projects together with other C++ libraries that use thrust. Since these projects are built ignorant of each other they use different thrust versions and cause weird runtime failures.

To be clear the proposal for rmm won’t impact the ability for non RAPIDS projects to use it. The rmm headers will adapt based on the value of THRUST_WRAPPED_NAMESPACE. What they lose is the ability to use something like raft + rmm + some thirdparty library ( non-header only ) that uses thrust in the same file ( TU ).

We should just get away from doing this.

I agree. I will look at changing the offending code to not expose thrust

I don’t think we’re ever concerned with ABI breaks in RAPIDS libraries.

What kind of ODR issues do you foresee? If two versions of Thrust are present, each in different namespaces, then they are different symbols so far as the program is concerned and therefore distinct definitions of the symbol won’t be a problem.

This doesn’t stop two different versions of Thrust being compiled using the same namespace. So cudf and a consumer could use differing versions of thrust and place it in the same namespace. This might happen due to a leakage of the THRUST_WRAPPED_NAMESPACE value to the consumer.

Would those types be interchangeable among RAPIDS libraries using rmm headers if the Thrust namespaces (and thus symbols) are different among each RAPIDS library?

The won’t be interchangable. We can see that when looking at the current cudf PR. The STRINGS_TEST fail to link due to the fact that it thinks cudf::make_strings_column(cudf::device_span<thrust::pair ...) should exist but the signature is actually cudf::make_strings_column(cudf::device_span<rapids::thrust::pair ...).

One of the reasons I am proposing a common prefix is that interop between libraries ( cudf / raft / cuml / cugraph ) is possible when they are used within the same TU. If each project goes with a unique thrust namespace, no interop that includes thrust types will be possible,

Yes, we should do this. It was on my list of things to investigate after I upgrade RAPIDS to Thrust 1.17. (Also I think it’s an outer namespace like CUSTOM::thrust::reduce.)