glow: Accessing names of symbols from SymbolTable* inside BundleConfig causes SIGSEGV
Running on Ubuntu 18.04 with:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/examples/bundles/x$ clang --version
clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
There are two issues here. First, the names in the SymbolTable
inside BundleConfig
do not seem to be consistent with those given in the ONNX
graph. For instance, consider this graph:
graph(%in : Float(1, 2)
%1 : Float(8, 2)
%2 : Float(8)
%3 : Float(8, 8)
%4 : Float(8)
%5 : Float(8, 8)
%6 : Float(8)
%7 : Float(1, 8)
%8 : Float(1)) {
%9 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%in, %1, %2), scope: FeedForwardNN/Linear
%10 : Float(1, 8) = onnx::Relu(%9), scope: FeedForwardNN/ReLU
%11 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%10, %3, %4), scope: FeedForwardNN/ReLU
%12 : Float(1, 8) = onnx::Relu(%11), scope: FeedForwardNN/ReLU
%13 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%12, %5, %6), scope: FeedForwardNN/ReLU
%14 : Float(1, 8) = onnx::Relu(%13), scope: FeedForwardNN/ReLU
%15 : Float(1, 1) = onnx::Gemm[alpha=1, beta=1, transB=1](%14, %7, %8), scope: FeedForwardNN/ReLU
%out : Float(1, 1) = onnx::Sigmoid(%15), scope: FeedForwardNN/Sigmoid
return (%out);
}
generated by
torch.onnx.export(ffnn, inp, "ffnn.onnx", input_names="in",
output_names="out", verbose=True, export_params=True)
Now viewing the symbols inside the generated bundle (with relocation-model=pic
):
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -p 5 ./output/x.o
String dump of section '.rodata':
[ 0] save_out
[ 9] in
In other words, Glow seems to prepend the output tensor name with save_
. I don’t think this is documented anywhere.
Now the real issue. Consider the following innocuous code that simply returns the (bounded) string length of the names from the symbol table:
#include <stdio.h>
#include <stdint.h>
struct SymbolTableEntry {
const char *name;
uint64_t offset;
uint64_t size;
char kind;
};
struct BundleConfig {
uint64_t cwv_size;
uint64_t mwv_size;
uint64_t a_size;
uint64_t alignment;
uint64_t nsymbols;
const struct SymbolTableEntry *symbols;
};
extern struct BundleConfig x_config;
int main(int argc, char **argv)
{
printf("Total number of symbols: %d\n", x_config.nsymbols);
for (int jj = 0; jj < x_config.nsymbols; ++jj) {
if (x_config.symbols[jj].name != NULL)
printf("strnlen(sym[%d].name, 8) == %d\n", jj, strnlen(x_config.symbols[jj].name, 8));
}
return 0;
}
Compiling and running the code gives:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/examples/bundles/x$ ./ttt
Total number of symbols: 8
strnlen(sym[0].name, 8) == 8
strnlen(sym[1].name, 8) == 2
Segmentation fault (core dumped)
Let’s look at those symbols inside x.o
again:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -s ./output/x.o
Symbol table '.symtab' contains 19 entries:
Num: Value Size Type Bind Vis Ndx Name
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
1: 0000000000000000 0 FILE LOCAL DEFAULT ABS llvm-link
2: 0000000000000000 0 NOTYPE LOCAL DEFAULT 4 .LCPI7_0
3: 0000000000000004 0 NOTYPE LOCAL DEFAULT 4 .LCPI7_1
4: 00000000000000a0 377 FUNC LOCAL DEFAULT 2 libjit_matmul_f_0_special
5: 00000000000002c0 272 FUNC LOCAL DEFAULT 2 libjit_matmul_f_2_special
6: 0000000000000510 195 FUNC LOCAL DEFAULT 2 libjit_matmul_f_5_special
7: 00000000000003d0 159 FUNC LOCAL DEFAULT 2 libjit_stacked_kernel.1_3
8: 0000000000000470 159 FUNC LOCAL DEFAULT 2 libjit_stacked_kernel.2_4
9: 00000000000005e0 62 FUNC LOCAL DEFAULT 2 libjit_stacked_kernel.3_6
10: 0000000000000220 159 FUNC LOCAL DEFAULT 2 libjit_stacked_kernel_1_s
11: 0000000000000000 64 OBJECT LOCAL DEFAULT 6 xSymbolTable
12: 0000000000000000 0 SECTION LOCAL DEFAULT 5
13: 0000000000000000 0 SECTION LOCAL DEFAULT 6
14: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND expf
15: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND free
16: 0000000000000000 0 NOTYPE GLOBAL DEFAULT UND posix_memalign
17: 0000000000000000 158 FUNC GLOBAL DEFAULT 2 x
18: 0000000000000040 48 OBJECT GLOBAL DEFAULT 6 x_config
Ok, now dump .rodata
:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -x .rodata ./output/x.o
Hex dump of section '.rodata':
0x00000000 73617665 5f6f7574 00696e00 save_out.in.
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$
We only see save_out
and in
. Where are the other names? Compiling with -relocation-model=static
gives:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -x .rodata ./output/x.o
Hex dump of section '.rodata':
NOTE: This section has relocations against it, but these have NOT been applied to this dump.
0x00000000 73617665 5f6f7574 00696e00 00000000 save_out.in.....
0x00000010 00000000 00000000 00000000 00000000 ................
0x00000020 01000000 00000000 01000000 00000000 ................
0x00000030 00000000 00000000 40000000 00000000 ........@.......
0x00000040 02000000 00000000 01000000 00000000 ................
0x00000050 80030000 00000000 80000000 00000000 ................
0x00000060 80000000 00000000 40000000 00000000 ........@.......
0x00000070 08000000 00000000 00000000 00000000 ................
The names have just been relocated, that’s all. Now viewing relocations:
(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -r ./output/x.o
Relocation section '.rela.text' at offset 0x960 contains 9 entries:
Offset Info Type Sym. Value Sym. Name + Addend
0000000000cd 000f00000004 R_X86_64_PLT32 0000000000000000 posix_memalign - 4
00000000020a 000e00000004 R_X86_64_PLT32 0000000000000000 free - 4
0000000002ed 000f00000004 R_X86_64_PLT32 0000000000000000 posix_memalign - 4
0000000003c1 000e00000004 R_X86_64_PLT32 0000000000000000 free - 4
000000000538 000f00000004 R_X86_64_PLT32 0000000000000000 posix_memalign - 4
0000000005c4 000e00000004 R_X86_64_PLT32 0000000000000000 free - 4
0000000005f6 000200000002 R_X86_64_PC32 0000000000000000 .LCPI7_0 - 4
0000000005ff 000d00000004 R_X86_64_PLT32 0000000000000000 expf - 4
000000000607 000300000002 R_X86_64_PC32 0000000000000004 .LCPI7_1 - 4
Relocation section '.rela.rodata' at offset 0xa38 contains 3 entries:
Offset Info Type Sym. Value Sym. Name + Addend
000000000010 000c00000001 R_X86_64_64 0000000000000000 .rodata + 0
000000000030 000c00000001 R_X86_64_64 0000000000000000 .rodata + 9
000000000078 000c00000001 R_X86_64_64 0000000000000000 .rodata + 10
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 15 (7 by maintainers)
Heh, actually I found this yesterday and was going to put up that fix today.
CC: @gcatron –
Function::findPlaceholders()
/Function::findConstants()
actually does the right thing here, I believe. Probably could have been named better.Agreed. Let’s keep this issue open until the correct fix is applied. As of right now, I am just going to do this dirty fix on my end to keep things rolling here.
Yep, I can confirm that changing to
findPlaceholders()
works correctly.