telegraf: win_perf_counter plugin does not work on 386

Bug report

Telegraf i386 crash on Windows:

2017-02-23T15:41:04Z I! Starting Telegraf (version 1.2.1)
2017-02-23T15:41:04Z I! Loaded outputs: file
2017-02-23T15:41:04Z I! Loaded inputs: inputs.win_perf_counters
2017-02-23T15:41:04Z I! Tags enabled: host=MSEDGEWIN10
2017-02-23T15:41:04Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"MSEDGEWIN10", Flush Interval:10s
unexpected fault address 0x5566b687
fatal error: fault
[signal 0xc0000005 code=0x0 addr=0x5566b687 pc=0x48a345]

goroutine 17 [running]:
runtime.throw(0xf13bde, 0x5)
        /usr/local/go/src/runtime/panic.go:566 +0x7f fp=0x12302e60 sp=0x12302e54
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_windows.go:164 +0x116 fp=0x12302e78 sp=0x12302e60
syscall.UTF16ToString(0x5566b687, 0x20000000, 0x20000000, 0x0, 0x0)
        /usr/local/go/src/syscall/syscall_windows.go:51 +0x35 fp=0x12302ea0 sp=0x12302e78
github.com/influxdata/telegraf/plugins/inputs/win_perf_counters.UTF16PtrToString(0x5566b687, 0x0, 0x0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/win_perf_counters/pdh.go:418 +0x62 fp=0x12302ec4 sp=0x12302ea0
github.com/influxdata/telegraf/plugins/inputs/win_perf_counters.(*Win_PerfCounters).Gather(0x1265a8e0, 0x14046a0, 0x1265af60, 0x0, 0x0)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/win_perf_counters/win_perf_counters.go:277 +0x32d fp=0x12302f98 sp=0x12302ec4
github.com/influxdata/telegraf/agent.gatherWithTimeout.func1(0x122a8a00, 0x1265aaa0, 0x1265af60)
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:153 +0x4c fp=0x12302fc8 sp=0x12302f98
runtime.goexit()
        /usr/local/go/src/runtime/asm_386.s:1612 +0x1 fp=0x12302fcc sp=0x12302fc8
created by github.com/influxdata/telegraf/agent.gatherWithTimeout
        /home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:154 +0xe5

Full telegraf.conf:

[[outputs.file]]
    files = ["stdout"]

[[inputs.win_perf_counters]]
  [[inputs.win_perf_counters.object]]
    ObjectName = "Processor"
    Instances = ["*"]
    Counters = [
      "% Idle Time",
    ]

System info:

Windows 10 amd64 and Windows 8 i386 Telegraf 1.2.1 and nightly

Steps to reproduce:

  1. Run telegraf and wait few seconds (like 20-30s)

Expected behavior:

No crash

Actual behavior:

Crash 😃

Additional info:

On the same machine, amd64 version works well. I’ve dig a bit on the probable root cause and I think that issue is a difference size in structure size between Go and Windows API. PDH_FMT_COUNTERVALUE_ITEM_DOUBLE has a size (according to unsafe.Sizeof, so according to Go) of 24 bytes on amd64 and 16 bytes on i386.

Both seems logical if structure aligns its fields on machine word size (8 bytes on amd64; 4 bytes on i386). The expanded structure is

struct {
    SzName *uint16   // machine word size: 4 or 8 bytes
                               // no padding needed to align on word size
    CStatus uint32     // 2 bytes
                               // padding to align on word size. 2 bytes on i386 and 6 bytes on amd64
    DoubleValue float64  // 8 bytes
}

But I think Windows and C++ do align on 8 bytes boundary for both i386 and amd64. I don’t have C++ compiler on Windows to confirm this hypothesis, but by adding few fmt.Printf that leads my to this idea:

Just before this for loop I’ve added:

fmt.Printf("ret=%#v, bufSize=%#v, bufCount=%#v\n", ret, bufSize, bufCount)
fmt.Printf("%#v\n", (*[1 << 29]byte)(unsafe.Pointer(&(filledBuf[0])))[:bufSize])

This will dump the number of items and the binary data in the buffer.

Result just before crash (on i386 version of telegraf):

ret=0x0, bufSize=0x42, bufCount=0x2
[]byte{
    0x30, 0x10, 0x3c, 0x12,        // this is szName
    0x0, 0x0, 0x0, 0x0,            // this looks like a padding to align CStatus on 8 bytes boundary
    0x0, 0x0,                      // this is CStatus
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0,  // this looks like a padding to align DoubleValue on 8 bytes boundary
    0x11, 0xc0, 0x11, 0x20, 0x2e, 0x83, 0x56, 0x40,  // This a a double value equal to 90.05, look good for a CPU Idle %

   0x3e, 0x10, 0x3c, 0x12,        // this look like another szName, address are rather close to first one (0x123c103e vs 0x123c1030)
   0x0, 0x0, 0x0, 0x0,            // padding
   0x0, 0x0,                      // CStatus
   0x0, 0x0, 0x0, 0x0, 0x0, 0x0,  // padding
   0x11, 0xc0, 0x11, 0x20, 0x2e, 0x83, 0x56, 0x40,  // double, equal to 90.05 like this first one.
                                                    // It's expected since the machine is a single core. One of the value is
                                                    // the single core, the other is the total (which on a single core is the same value)
 
   // I don't know why there is always some additional data... it's the case on i386 and amd64
   0x5f, 0x0, 0x54, 0x0, 0x6f, 0x0, 0x74, 0x0, 0x61, 0x0, 0x6c, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0
}

But since Go assume alignment is done on machine word size, it will interpret value as:

[]byte{
    0x30, 0x10, 0x3c, 0x12,        // this is szName, good
    0x0, 0x0,                      // Use this as CStatus... okay since CStatus seems to always be 0 like padding
    0x0, 0x0,                      // padding to align DoubleValue on 4 bytes boundary
    0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,  // Then Go assume this is DoubleValue... equal to 0.0. Unlikely for a CPU Idle % (the machine do nothing)
    
   0x11, 0xc0, 0x11, 0x20,  // this should be the next szName then... but cause the unexpected fault address 0x2011c011
   0x2e, 0x83,              // this should be CStatus
   0x56, 0x40,              // This should be a padding... with non-zero
   0x3e, 0x10, 0x3c, 0x12, 0x0, 0x0, 0x0, 0x0,  // this should be DoubleValue, equal to 1.511476285e-315
   
   0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0,
   0x11, 0xc0, 0x11, 0x20, 0x2e, 0x83, 0x56, 0x40,
   0x5f, 0x0, 0x54, 0x0, 0x6f, 0x0, 0x74, 0x0, 0x61, 0x0, 0x6c, 0x0, 0x0, 0x0, 0x30, 0x0, 0x0, 0x0
}

Proposal:

We should verify that Windows C++ do align structure on 8 bytes boundary (anyone with a C++ compiler on Windows ?, just checking sizeof(PDH_FMT_COUNTERVALUE_ITEM) should be good). If confirmed, we should find how to tell Go to align on 8 bytes boundary for i386 and amd64

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 2
  • Comments: 19 (7 by maintainers)

Most upvoted comments

It would be great to fix this issue, but maybe in the meantime the default configuration shipped with the 32-bit Windows build could be changed so that all the [[inputs.win_perf_counters.*]] sections are commented, and the following sections are uncommented:

[[inputs.cpu]]
[[inputs.disk]]
[[inputs.diskio]]
[[inputs.mem]]
[[inputs.swap]]

This would at least prevent the bad out-of-the-box experience with this 32-bit build.

Yes, I can. I’m actually waitng for approval from my end. Sorry guys, this is taking time. I don’t expect to have permission probably until Monday or Tuesday.

@kmonsoor No one is working on this as far as I know, would you be able to take a look?