iree: Bytecode from `iree-import-tflite` causes debug info to be dropped in `translateModuleToLLVMIR`

This is to continue the investigation in https://github.com/openxla/iree/pull/15756, which is being reverted now.

Summary:

https://github.com/openxla/iree/pull/15756 should have been a 1-line PR, simply adding -gline-tables-only to the clang flags that we use to compile ukernels to bitcode. All it does is generate some debug info in the compiled ukernels bitcode.

This runs into a bug discussed on this comment thread: https://github.com/openxla/iree/pull/15756/files#r1411583898

  • The bug in itself actually makes sense, as noted in this subsequent comment on that thread: https://github.com/openxla/iree/pull/15756/files#r1518132234 .
    • It’s saying that a callee with debug info (here the ukernel) doesn’t like getting inlined into a caller without debug info (here the dispatch function).
    • The mystery is why is the dispatch function lacking debug info? We are generating debug info by default, and in most models (more on that in the next bullet point) we are getting debug info and all works fine.
    • It seems that we had a preexisting bug, that debug info was sometimes missing from dispatch functions, that we only discovered now that linking with ukernels with debug info made it a compilation error.
  • The only models affected are some TFLite models. Even then, the problem goes away:
    • If the output of iree-import-tflite is run through iree-opt with no flags…
    • … or just mlir-opt, or even just dumping the IR immediately after loading the input file in mlir-opt
    • … so that it seems that simply disassembling bitcode back to textual IR is enough to fix it.
    • This is the key problem making this so hard to debug: our normal first debugging step, looking at the text IR, prevents reproducing!
  • The problem also goes away:
  • The dumped .codegen.bc actually does have some limited debug info (on each function, but not on the ops inside each function). But the debug info somehow gets dropped inside of mlir::translateModuleToLLVMIR (https://github.com/openxla/iree/pull/15756/files#r1518324689). I haven’t been able to debug exactly where, but that’s the closest i’ve been able to get.

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Comments: 27 (18 by maintainers)

Most upvoted comments

Fascinating, I would have gotten lost the moment the bug failed to express as text

If at the very start of the serializeExecutable function I dump variantOp, the resulting (text) dump is identical in the bad and good cases.

If I dump the parent mlir::ModuleOp, the only difference is in the namespace string; if I replace the name (sed 's/iree_PoseNet_fp32_tflite_/llvm_module/g') then the ModuleOp is identical.

I think at this point we have narrowed this bug down to something about bitcode. In the bad case, the module as loaded from bitcode is corrupted in a way that somehow causes debug info to be dropped in translateModuleToLLVMIR, but that corruption doesn’t survive text-serialization, so we can’t see it in any kind of textual IR dump.