clangd: Source file chosen to infer compile commands for a header sometimes has the wrong language

Whether or not this qualifies as a bug report or a feature request could I guess be discussed.

After reporting https://llvm.discourse.group/t/header-file-heuristics-issue/1749 and investigating it further, I arrive at the conclusion that the issue I am describing is probably just a logical consequence of the clangd heuristics used for header files in my project structure.

A short summary of the issue:

  • My project contains C and C++ files.
  • I have a compilation database which contains an entry for every C or C++ file, but no entry for header files, which I guess is quite standard.
  • My C++ header files have the extension .hpp.
  • In the clangd log (10.0.0 or 11.0.0 rc1), I can sometimes see some entries of the following type: Updating file /path/to/Header.hpp with command inferred from /other/path/to/arbitrary.c. I.e. clangd picks a C-file instead of a C++ file.

Sometimes, however, clangd will pick a C++ file which does not necessarily include the header file in question. While I understand that picking the right translation unit for a header file is a little arbitrary (there could be several correct answers, with different compiler flags), it feels like picking a C++ file instead of a C file in the case above should be feasible, and could also be a non-negligible improvement for some uses cases (like mine, because I would then get the correct flags, like -std=c++17etc.). It seems like the heuristics use some kind of distance in the file hierarchy between the header file and the C/C++ file inferred as a criterion, which I guess is fine as a general principle, but it would be nice if files of the same language as the header file’s (if clangd knows about that, but in my case it is clear from the extension) are picked with a higher priority.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 13
  • Comments: 35 (7 by maintainers)

Commits related to this issue

Most upvoted comments

@bradlarsen, thanks, really interesting reads. I was just trying to write down some thinking about header files in compilation databases when you posted your entry about compdb. My thinking was that given the objectives of compilation databases, it would not have seemed entirely crazy to extend the file format to include translation units dependencies.

Like:

{ "directory": "/home/user/llvm/build",
    "command": "/usr/bin/clang++ -Irelative -DSOMEDEF=\"With spaces, quotes and \\-es.\" -c -o file.o file.cc",
    "file": "file.cc"
    "deps": ["file.h", "file2.h"] }

or something similar.

That would seem better than making up compilation commands for header files when they do not really exist. And clangd’s (and other’s!) job would be much simpler. Because of course, any proper build system has that information, since it needs it to ensure that TU only get recompiled when updated.

I have also found this CMake issue, which I guess is relevant.

I would like to submit an additional use case related to this. I have header files with preprocessor conditions, meaning the active portions of the header files vary depending on which file includes it. I would like to be able to tell clangd: show me the analysis for this header file as seen by this source file or that source file, and be able to switch at runtime.

I imagine a flow like this:

  • Editor opens foo.h
  • Editor asks clangd for the known “views” for foo.h
  • Clangd returns a list of views for foo.h, that is all the times it knows foo.h is included by a compilation unit
  • Editor presents a dropdown to the user, allowing them to switch between views on that header file
  • When selecting a view, editor informs clangd to use the new view

Would that make sense?

While looking at solving this another way, I ran across the -M (–dependencies) clang flag, which dumps the headers used by a particular file. It really seems to me like the right solution is to have clangd infer commands for headers from the source files that use them! Perhaps we could mostly reuse infrastructure from that flag

I don’t have any C++ files in my CDB and still Clangd is trying to parse an external header file as a C++ header with fallback flags. Here are parts from the verbose log: […] As you can see with main.c file, Clangd correctly used standard headers from the provided query driver, but with an out-of-tree header file it used C++ flags.

In your case, the issue is not that clangd is picking a wrong-language file from the CDB, but rather that it’s not using the CDB for out-of-tree headers at all.

For each open file, clangd looks for a CDB in the directory containing the file and its ancestor directories. If your CDB is in your workspace root and you open a file outside the workspace root, this algorithm therefore will not find the CDB for such files.

You can force clangd to use a CDB for all files include out-of-tree files by passing --compile-commands-dir=<directory> as a command-line argument to clangd, where <directory> is the directory containing your CDB.

Hi guys, is there any news on this issue? I’m trying to use the VSCode extension in pair with a GCC cross-compiler for an embedded target. My project contains several files with code that rely on compiler type and since headers are not present in the compilation database Clang is trying to pretend to be the default compiler in the OS (in my case it MSVC) which leads to intellisense issues. It’s somewhat possible to partially overcome the problem with clangd.fallbackFlags, but then Clangd is not able to find standard headers since --query-driver is not taken into account.

When choosing a file to infer the commands from, we always prefer one in the same language, but we only consider some candidates, so if there are no C++ files nearby and no C++ files has identical name, we’ll choose a C file instead.

https://reviews.llvm.org/D87253 makes it so that if there’s even one C++ file in the project, that one would be preferred. I’m honestly not sure if that’s the right thing to do or not. Let’s see what others think.

@cpsauer You’re too kind 😃

why there is a separate -x c+±header

This is a pretty minor distinction.

  • it affects some diagnostics (e.g. some variants of -Wunused apply in header files but not main files, clang warns on #include_next in the main file,
  • maybe it affects clang’s interpretation of the command line when you’re building a PCH, too?

I’ve landed D116167, so now this bug is just about the original topic: preferring (more) files from the same language.

https://reviews.llvm.org/D116167 should address this recent issue raised by @cpsauer: explicitly enriching the database by simply copying the argv should work.

Regarding other ideas discussed here:

  • using the actual include graph to find the file to transfer. This is covered by #123 (Thanks @HighCommander4 for reopening). We already do this for opened files but index-based approaches could cover more.
  • prefer matching-language harder. Currently we prefer it unless no same-language header gets any points. I think this was just an expedience/performance thing originally and we could fix it.
  • the question of whether a CDB should be used for files in other directories is not in scope for this bug
  • there are possible some other issues mentioned but without specific examples I can’t understand them

So I think we should maybe-fix the second bullet (to be discussed in review) and then close this bug.

I’m having the same problem but with cross-compilation.

I’m compiling a Yocto project. One of my file includes C++ “vector” header. In my IDE (vscode), it opens the correct file (the header used by Yocto’s cross-compiler, i.e. “…/<project>/recipe-sysroot/usr/include/…”). If I then try to open one of the file included by “vector”, e.g. “vector.tcc”, the file from my host system is being opened, i.e. “/usr/include/…”. This becomes really problematic for headers that don’t exist on the host, i.e. most library headers beside STL/C++/C.

Hi @HighCommander4! It’s partially related to this issue. I don’t have any C++ files in my CDB and still Clangd is trying to parse an external header file as a C++ header with fallback flags. Here are parts from the verbose log:

Header file
I[12:54:15.268] <-- textDocument/didOpen
I[12:54:15.270] Failed to find compilation database for c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include\core_cm4.h
I[12:54:15.270] ASTWorker building file c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include\core_cm4.h version 35 with command clangd fallback
[c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include]
E:\Temp\LLVM-11\bin\clang -xobjective-c++-header c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include\core_cm4.h -mthumb -mcpu=cortex-m4 -DSTM32F401xE -DSTM32F40_41xxx -Iinclude -Isrc -IC:/packages/framework-stm32cube/f4/Drivers/CMSIS/Include -IC:/packages/framework-stm32cube/f4/Drivers/CMSIS/Device/ST/STM32F4xx/Include -IC:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -IC:/packages/framework-stm32cube/f4/Drivers/BSP/Components/Common -IC:/packages/framework-stm32cube/f4/Drivers/BSP/STM32F4xx-Nucleo -Ic:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -fsyntax-only -resource-dir=e:\Temp\LLVM-11\lib\clang\11.0.0
V[12:54:15.288] Ignored diagnostic. argument unused during compilation: '-mthumb'
V[12:54:15.288] Ignored diagnostic. argument unused during compilation: '-mcpu=cortex-m4'
V[12:54:15.288] Driver produced command: cc1 -cc1 -triple x86_64-pc-windows-msvc19.23.28106 -fsyntax-only -disable-free -disable-llvm-verifier -discard-value-names -main-file-name core_cm4.h -mrelocation-model pic -pic-level 2 -mframe-pointer=none -fmath-errno -fno-rounding-math -mconstructor-aliases -munwind-tables -target-cpu x86-64 -resource-dir e:\Temp\LLVM-11\lib\clang\11.0.0 -D STM32F401xE -D STM32F40_41xxx -I include -I src -I C:/packages/framework-stm32cube/f4/Drivers/CMSIS/Include -I C:/packages/framework-stm32cube/f4/Drivers/CMSIS/Device/ST/STM32F4xx/Include -I C:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -I C:/packages/framework-stm32cube/f4/Drivers/BSP/Components/Common -I C:/packages/framework-stm32cube/f4/Drivers/BSP/STM32F4xx-Nucleo -I c:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -internal-isystem e:\Temp\LLVM-11\lib\clang\11.0.0\include -internal-isystem E:\IDEs\MVS\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\include -internal-isystem E:\IDEs\MVS\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.23.28105\atlmfc\include -internal-isystem C:\Program Files (x86)\Windows Kits\8.1\include\shared -internal-isystem C:\Program Files (x86)\Windows Kits\8.1\include\um -internal-isystem C:\Program Files (x86)\Windows Kits\8.1\include\winrt -fdeprecated-macro -fdebug-compilation-dir c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include -ferror-limit 19 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.23.28106 -std=c++14 -fdelayed-template-parsing -fobjc-runtime=gcc -fobjc-exceptions -fcxx-exceptions -fexceptions -faddrsig -x objective-c++-header c:\packages\framework-stm32cube\f4\Drivers\CMSIS\Include\core_cm4.h
С file
V[12:54:19.494] System include extraction: adding  c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include

V[12:54:19.494] System include extraction: adding  c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include-fixed

V[12:54:19.494] System include extraction: adding  c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/include

I[12:54:19.494] System include extractor: successfully executed C:/packages/toolchain-gccarmnoneeabi@1.70201.0/bin/arm-none-eabi-gcc.exe, got includes: "c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include, c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include-fixed, c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/include"
I[12:54:19.494] ASTWorker building file e:\Projects\stm32cube-hal-blink\src\main.c version 5 with command 
[E:/Projects/stm32cube-hal-blink]
C:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\arm-none-eabi-gcc.exe -target arm-none-eabi -o .pio/build/nucleo_f401re/src/main.o -c -Os -ffunction-sections -fdata-sections -Wall -mthumb -mcpu=cortex-m4 -nostdlib -DDEFINES_FROM_COMPILE_COMMANDS -DSTM32F4 -DSTM32F401xE -DSTM32F40_41xxx -DF4 -DUSE_HAL_DRIVER -DF_CPU=84000000L -Iinclude -Isrc -IC:/packages/framework-stm32cube/f4/Drivers/CMSIS/Include -IC:/packages/framework-stm32cube/f4/Drivers/CMSIS/Device/ST/STM32F4xx/Include -IC:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -IC:/packages/framework-stm32cube/f4/Drivers/BSP/Components/Common -IC:/packages/framework-stm32cube/f4/Drivers/BSP/STM32F4xx-Nucleo src/main.c -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include-fixed -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/include -D DEFINE_FROM_CLANGD_FILE -fsyntax-only -resource-dir=e:\Temp\LLVM-11\lib\clang\11.0.0
V[12:54:19.495] Driver produced command: cc1 -cc1 -triple thumbv7em-none-unknown-eabi -fsyntax-only -disable-free -disable-llvm-verifier -discard-value-names -main-file-name main.c -mrelocation-model static -mframe-pointer=all -fmath-errno -fno-rounding-math -fno-verbose-asm -mconstructor-aliases -nostdsysteminc -target-cpu cortex-m4 -target-feature +soft-float-abi -target-feature -crc -target-feature -crypto -target-feature -sha2 -target-feature -aes -target-feature -dotprod -target-feature +dsp -target-feature -mve -target-feature -mve.fp -target-feature -fullfp16 -target-feature -ras -target-feature -bf16 -target-feature -sb -target-feature -i8mm -target-feature -lob -target-feature -cdecp0 -target-feature -cdecp1 -target-feature -cdecp2 -target-feature -cdecp3 -target-feature -cdecp4 -target-feature -cdecp5 -target-feature -cdecp6 -target-feature -cdecp7 -target-feature -hwdiv-arm -target-feature +hwdiv -target-feature -fp16fml -target-feature +strict-align -target-abi aapcs -mfloat-abi soft -fallow-half-arguments-and-returns -fno-split-dwarf-inlining -debugger-tuning=gdb -ffunction-sections -fdata-sections -resource-dir e:\Temp\LLVM-11\lib\clang\11.0.0 -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/include-fixed -isystem c:\packages\toolchain-gccarmnoneeabi@1.70201.0\bin\../lib/gcc/arm-none-eabi/7.2.1/../../../../arm-none-eabi/include -D DEFINES_FROM_COMPILE_COMMANDS -D STM32F4 -D STM32F401xE -D STM32F40_41xxx -D F4 -D USE_HAL_DRIVER -D F_CPU=84000000L -I include -I src -I C:/packages/framework-stm32cube/f4/Drivers/CMSIS/Include -I C:/packages/framework-stm32cube/f4/Drivers/CMSIS/Device/ST/STM32F4xx/Include -I C:/packages/framework-stm32cube/f4/Drivers/STM32F4xx_HAL_Driver/Inc -I C:/packages/framework-stm32cube/f4/Drivers/BSP/Components/Common -I C:/packages/framework-stm32cube/f4/Drivers/BSP/STM32F4xx-Nucleo -D  DEFINE_FROM_CLANGD_FILE -internal-isystem e:\Temp\LLVM-11\lib\clang\11.0.0\include -internal-isystem include -Os -Wall -fdebug-compilation-dir E:/Projects/stm32cube-hal-blink -ferror-limit 19 -fno-signed-char -fgnuc-version=4.2.1 -vectorize-loops -vectorize-slp -faddrsig -x c src/main.c

As you can see with main.c file, Clangd correctly used standard headers from the provided query driver, but with an out-of-tree header file it used C++ flags. Please let me know if you would like to see the entire log.

Clangd should still be picking some source file from the compilation database to infer compile commands from for a header.

The issue in this ticket, is that it sometimes infers a source file of the wrong language (e.g. a C source file for a C++ header in a project that contains a mix of source files in both languages). Is that the case for your project?

If your issue is that clangd is not inferring a source file at all, that’s a different issue that can potentially be solved on your end. (For example, if the header is outside the directory containing the compilation database, clangd won’t use the CDB for it by default, but it can be made to with the --compile-commands-dir option.)

If you share a clangd log (from VSCode’s Output view, “Clangd Language Server” option in the dropdown) that should shed some light on things.