glow: Accessing names of symbols from SymbolTable* inside BundleConfig causes SIGSEGV

Running on Ubuntu 18.04 with:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/examples/bundles/x$ clang --version
clang version 7.0.0-3~ubuntu0.18.04.1 (tags/RELEASE_700/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

There are two issues here. First, the names in the SymbolTable inside BundleConfig do not seem to be consistent with those given in the ONNX graph. For instance, consider this graph:

graph(%in : Float(1, 2)
      %1 : Float(8, 2)
      %2 : Float(8)
      %3 : Float(8, 8)
      %4 : Float(8)
      %5 : Float(8, 8)
      %6 : Float(8)
      %7 : Float(1, 8)
      %8 : Float(1)) {
  %9 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%in, %1, %2), scope: FeedForwardNN/Linear
  %10 : Float(1, 8) = onnx::Relu(%9), scope: FeedForwardNN/ReLU
  %11 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%10, %3, %4), scope: FeedForwardNN/ReLU
  %12 : Float(1, 8) = onnx::Relu(%11), scope: FeedForwardNN/ReLU
  %13 : Float(1, 8) = onnx::Gemm[alpha=1, beta=1, transB=1](%12, %5, %6), scope: FeedForwardNN/ReLU
  %14 : Float(1, 8) = onnx::Relu(%13), scope: FeedForwardNN/ReLU
  %15 : Float(1, 1) = onnx::Gemm[alpha=1, beta=1, transB=1](%14, %7, %8), scope: FeedForwardNN/ReLU
  %out : Float(1, 1) = onnx::Sigmoid(%15), scope: FeedForwardNN/Sigmoid
  return (%out);
}

generated by

torch.onnx.export(ffnn, inp, "ffnn.onnx", input_names="in", 
                  output_names="out", verbose=True, export_params=True)

Now viewing the symbols inside the generated bundle (with relocation-model=pic):

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -p 5 ./output/x.o
String dump of section '.rodata':
  [     0]  save_out
  [     9]  in

In other words, Glow seems to prepend the output tensor name with save_. I don’t think this is documented anywhere.

Now the real issue. Consider the following innocuous code that simply returns the (bounded) string length of the names from the symbol table:

#include <stdio.h>
#include <stdint.h>

struct SymbolTableEntry {
    const char *name;
    uint64_t offset;
    uint64_t size;
    char kind;
};

struct BundleConfig {
    uint64_t cwv_size;
    uint64_t mwv_size;
    uint64_t a_size;
    uint64_t alignment;
    uint64_t nsymbols;
    const struct SymbolTableEntry *symbols;
};

extern struct BundleConfig x_config;

int main(int argc, char **argv)
{
    printf("Total number of symbols: %d\n", x_config.nsymbols);

    for (int jj = 0; jj < x_config.nsymbols; ++jj) {
        if (x_config.symbols[jj].name != NULL)
            printf("strnlen(sym[%d].name, 8) == %d\n", jj, strnlen(x_config.symbols[jj].name, 8));
    }

    return 0;
}

Compiling and running the code gives:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/examples/bundles/x$ ./ttt
Total number of symbols: 8
strnlen(sym[0].name, 8) == 8
strnlen(sym[1].name, 8) == 2
Segmentation fault (core dumped)

Let’s look at those symbols inside x.o again:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -s ./output/x.o

Symbol table '.symtab' contains 19 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS llvm-link
     2: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT    4 .LCPI7_0
     3: 0000000000000004     0 NOTYPE  LOCAL  DEFAULT    4 .LCPI7_1
     4: 00000000000000a0   377 FUNC    LOCAL  DEFAULT    2 libjit_matmul_f_0_special
     5: 00000000000002c0   272 FUNC    LOCAL  DEFAULT    2 libjit_matmul_f_2_special
     6: 0000000000000510   195 FUNC    LOCAL  DEFAULT    2 libjit_matmul_f_5_special
     7: 00000000000003d0   159 FUNC    LOCAL  DEFAULT    2 libjit_stacked_kernel.1_3
     8: 0000000000000470   159 FUNC    LOCAL  DEFAULT    2 libjit_stacked_kernel.2_4
     9: 00000000000005e0    62 FUNC    LOCAL  DEFAULT    2 libjit_stacked_kernel.3_6
    10: 0000000000000220   159 FUNC    LOCAL  DEFAULT    2 libjit_stacked_kernel_1_s
    11: 0000000000000000    64 OBJECT  LOCAL  DEFAULT    6 xSymbolTable
    12: 0000000000000000     0 SECTION LOCAL  DEFAULT    5 
    13: 0000000000000000     0 SECTION LOCAL  DEFAULT    6 
    14: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND expf
    15: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND free
    16: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT  UND posix_memalign
    17: 0000000000000000   158 FUNC    GLOBAL DEFAULT    2 x
    18: 0000000000000040    48 OBJECT  GLOBAL DEFAULT    6 x_config

Ok, now dump .rodata:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -x .rodata ./output/x.o

Hex dump of section '.rodata':
  0x00000000 73617665 5f6f7574 00696e00          save_out.in.

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ 

We only see save_out and in. Where are the other names? Compiling with -relocation-model=static gives:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -x .rodata ./output/x.o

Hex dump of section '.rodata':
 NOTE: This section has relocations against it, but these have NOT been applied to this dump.
  0x00000000 73617665 5f6f7574 00696e00 00000000 save_out.in.....
  0x00000010 00000000 00000000 00000000 00000000 ................
  0x00000020 01000000 00000000 01000000 00000000 ................
  0x00000030 00000000 00000000 40000000 00000000 ........@.......
  0x00000040 02000000 00000000 01000000 00000000 ................
  0x00000050 80030000 00000000 80000000 00000000 ................
  0x00000060 80000000 00000000 40000000 00000000 ........@.......
  0x00000070 08000000 00000000 00000000 00000000 ................

The names have just been relocated, that’s all. Now viewing relocations:

(base) vagrant@ubuntu-18:~/dts/nn/xgmr/xperi-glow/build/debug$ readelf -r ./output/x.o

Relocation section '.rela.text' at offset 0x960 contains 9 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
0000000000cd  000f00000004 R_X86_64_PLT32    0000000000000000 posix_memalign - 4
00000000020a  000e00000004 R_X86_64_PLT32    0000000000000000 free - 4
0000000002ed  000f00000004 R_X86_64_PLT32    0000000000000000 posix_memalign - 4
0000000003c1  000e00000004 R_X86_64_PLT32    0000000000000000 free - 4
000000000538  000f00000004 R_X86_64_PLT32    0000000000000000 posix_memalign - 4
0000000005c4  000e00000004 R_X86_64_PLT32    0000000000000000 free - 4
0000000005f6  000200000002 R_X86_64_PC32     0000000000000000 .LCPI7_0 - 4
0000000005ff  000d00000004 R_X86_64_PLT32    0000000000000000 expf - 4
000000000607  000300000002 R_X86_64_PC32     0000000000000004 .LCPI7_1 - 4

Relocation section '.rela.rodata' at offset 0xa38 contains 3 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
000000000010  000c00000001 R_X86_64_64       0000000000000000 .rodata + 0
000000000030  000c00000001 R_X86_64_64       0000000000000000 .rodata + 9
000000000078  000c00000001 R_X86_64_64       0000000000000000 .rodata + 10

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

Heh, actually I found this yesterday and was going to put up that fix today.

The thing is that placeholders are defined for the whole module, but of them may be not actually used in the code and no WeightVars are allocated for them. A symbol table probably should not contain any entries about such placeholders as they are useless.

CC: @gcatronFunction::findPlaceholders() / Function::findConstants() actually does the right thing here, I believe. Probably could have been named better.

A symbol table probably should not contain any entries about such placeholders as they are useless.

Agreed. Let’s keep this issue open until the correct fix is applied. As of right now, I am just going to do this dirty fix on my end to keep things rolling here.

Yep, I can confirm that changing to findPlaceholders() works correctly.