HIPIFY: [HIPIFY] hipify-clang misbehaves in the presence of preprocessor directives

Consider the following CUDA program:

__global__ void axpy_kernel(float a, float* x, float* y) {
    y[threadIdx.x] = a * x[threadIdx.x];
}

void axpy(float a, float* x, float* y) {
    axpy_kernel<<<1, 4>>> (a, x, y);
#ifdef SOME_MACRO
    axpy_kernel<<<1, 4>>> (a, x, y);
#endif
}

The result of hipifying this, if you don’t pass -DSOME_MACRO is:

#include <hip/hip_runtime.h>
__global__ void axpy_kernel(float a, float* x, float* y) {
    y[hipThreadIdx_x] = a * x[hipThreadIdx_x];
}

void axpy(float a, float* x, float* y) {
    hipLaunchKernelGGL(axpy_kernel, dim3(1), dim3(4), 0, 0, a, x, y);
#ifdef SOME_MACRO
    axpy_kernel<<<1, 4>>> (a, x, y);
#endif

}

Respecting conditional macros isn’t the right thing to do with this sort of mechanised refactoring - what you really want to do is walk the entire tree applying your refactor, regardless of preprocessor conditionals.

This is going to present a relatively nasty obstacle to people with complicated CUDA programs they want to translate…

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 17 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Which of them?

Both of the example cases from the original report still fail. As I explained in my pull request and commit messages, my work fixes handling of unconditional macros.

You can see examples of the sorts of situations that ROCm-Developer-Tools/HIP#235 fixed here: https://github.com/ROCm-Developer-Tools/HIP/blob/094b2b9b0503c1e2935863a1d596d1045b71e7e4/tests/hipify-clang/axpy.cu#L6-L12

https://github.com/ROCm-Developer-Tools/HIP/blob/094b2b9b0503c1e2935863a1d596d1045b71e7e4/tests/hipify-clang/axpy.cu#L43-L56

Seems like a fine addition if we guard it appropriately (so it only gets used on C++17). hip_runtime_api.h provides a C API so needs to be compileable with vanilla C compilers such as gcc.

A preprocessor check for the value of __cplusplus being >= 201703L seems like it should work on absolutely every compiler ever.

Why isn’t it enough to just look at the source code, fix all the macro definitions, and move on?

Preprocessor directives are not included in AST.

Can you give an example of a situation when you need to have a macro expanded? When do you need the preprocessed source?

CUDA_8.0/CUDASamples/common/inc/helper_cuda.h:

#define checkCudaErrors(val)           check ( (val), #val, __FILE__, __LINE__ )

CUDA_8.0\include\driver_types.h:

#define cudaEventDisableTiming              0x02  /**< Event will not record timing data */

CUDA_8.0/CUDASamples/6_Advanced/concurrentKernels/concurrentKernels.cu:

#include <helper_cuda.h>
...
cudaEvent_t *kernelEvent;
kernelEvent = (cudaEvent_t *)malloc(nkernels * sizeof(cudaEvent_t));
for (int i = 0; i < nkernels; i++)
{
  checkCudaErrors(cudaEventCreateWithFlags(&(kernelEvent[i]), cudaEventDisableTiming));
}