iree: Bytecode from `iree-import-tflite` causes debug info to be dropped in `translateModuleToLLVMIR`
This is to continue the investigation in https://github.com/openxla/iree/pull/15756, which is being reverted now.
Summary:
https://github.com/openxla/iree/pull/15756 should have been a 1-line PR, simply adding -gline-tables-only
to the clang flags that we use to compile ukernels to bitcode. All it does is generate some debug info in the compiled ukernels bitcode.
This runs into a bug discussed on this comment thread: https://github.com/openxla/iree/pull/15756/files#r1411583898
- The bug in itself actually makes sense, as noted in this subsequent comment on that thread: https://github.com/openxla/iree/pull/15756/files#r1518132234 .
- It’s saying that a callee with debug info (here the ukernel) doesn’t like getting inlined into a caller without debug info (here the dispatch function).
- The mystery is why is the dispatch function lacking debug info? We are generating debug info by default, and in most models (more on that in the next bullet point) we are getting debug info and all works fine.
- It seems that we had a preexisting bug, that debug info was sometimes missing from dispatch functions, that we only discovered now that linking with ukernels with debug info made it a compilation error.
- The only models affected are some TFLite models. Even then, the problem goes away:
- If the output of
iree-import-tflite
is run throughiree-opt
with no flags… - … or just
mlir-opt
, or even just dumping the IR immediately after loading the input file inmlir-opt
… - … so that it seems that simply disassembling bitcode back to textual IR is enough to fix it.
- This is the key problem making this so hard to debug: our normal first debugging step, looking at the text IR, prevents reproducing!
- If the output of
- The problem also goes away:
- If we build with
--iree-hal-dump-executable-sources-to
(https://github.com/openxla/iree/pull/15756/files#r1518324689). The reason is that that causes debug info to be reintroduced after the bug had already caused it to be dropped.
- If we build with
- The dumped
.codegen.bc
actually does have some limited debug info (on each function, but not on the ops inside each function). But the debug info somehow gets dropped inside ofmlir::translateModuleToLLVMIR
(https://github.com/openxla/iree/pull/15756/files#r1518324689). I haven’t been able to debug exactly where, but that’s the closest i’ve been able to get.
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 27 (18 by maintainers)
Fascinating, I would have gotten lost the moment the bug failed to express as text
If at the very start of the
serializeExecutable
function I dumpvariantOp
, the resulting (text) dump is identical in the bad and good cases.If I dump the parent
mlir::ModuleOp
, the only difference is in the namespace string; if I replace the name (sed 's/iree_PoseNet_fp32_tflite_/llvm_module/g'
) then theModuleOp
is identical.I think at this point we have narrowed this bug down to something about bitcode. In the bad case, the module as loaded from bitcode is corrupted in a way that somehow causes debug info to be dropped in
translateModuleToLLVMIR
, but that corruption doesn’t survive text-serialization, so we can’t see it in any kind of textual IR dump.