node_exporter: fatal error: runtime.unlock: lock count with Go >= 1.10

Host operating system: output of uname -a

Linux endor 4.13.0-37-generic #42~16.04.1-Ubuntu SMP Wed Mar 7 16:03:28 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

node_exporter version: output of node_exporter --version

node_exporter, version 0.16.0-rc.0 (branch: HEAD, revision: 002c1ca02917406cbecc457162e2bdb1f29c2f49)
  build user:       root@5ff5455ac873
  build date:       20180309-15:09:26
  go version:       go1.10

Used the release artifact at: https://github.com/prometheus/node_exporter/releases/download/v0.16.0-rc.0/node_exporter-0.16.0-rc.0.linux-amd64.tar.gz

node_exporter command line flags

None, the defaults for 0.16 match my needs

Are you running node_exporter in Docker?

No

What did you do that produced an error?

Just ran it for a couple of days

What did you expect to see?

It not to crash

What did you see instead?

Mar 28 19:47:54 endor node_exporter[18076]: fatal error: runtime·unlock: lock count
Mar 28 19:48:03 endor systemd[1]: node_exporter.service: Main process exited, code=killed, status=11/SEGV

That fatal error line got spewed about a 1000 times, all logged at 19:47:54 according to systemd.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 88 (67 by maintainers)

Commits related to this issue

Most upvoted comments

Alright. Built one with Go 1.10.1, with HEAD at 768be139beb1522b092a0ec6ab7b7b3047216577.

$ go get github.com/prometheus/node_exporter
$ cd ~/Development/go/src/github.com/prometheus/node_exporter
$ env CGO_ENABLED=0 go build

It’s deployed now, lets see what happens. It might take some time for the bug to manifest though.

Oh hey, you might remember me from that other time node_exporter found a Golang bug. I just hit this one too:

fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: runtime·unlock: lock count
fatal error: fatal error: runtime·unlock: lock countruntime·lock: lock count

panic during panic
stack trace unavailable

runtime stack:

This is Gentoo again, versions:

node_exporter, version 0.16.0 (branch: non-git, revision: d42bd70)
  build user:       portage@binhost
  build date:       20180619-21:57:10
  go version:       go1.10.1

Unfortunately this just happened on a VM with many siblings running on the same hardware, after days of uptime, and I will only have access to this hardware for a couple more days, so it’s not looking good for reproducibility. However, I’ll be running these VM images at home and see if I can get it to repro on other hardware.

I will try running under under the Go 1.11 beta and put our node_exporters up for the test. We have had 1-2 crashes per week on ~70 instances. I’ll report back in a week.

The temporary fix is to build with Go 1.9.x.

@LaurentDumont Please give rc.3 a try, it’s built with Go 1.9.5, which should be more stable.

I got really annoyed with this bug so I decided to try and bruteforce it:

  • Spun up an Ubuntu 16.04 instance
  • Created the following fs layout:
    • /root/gophers/gopath195
    • /root/gophers/gopath110
    • /root/gophers/builds/195
    • /root/gophers/builds/110
  • Downloaded and extracted Go 1.10.1 and Go 1.9.5 in /root/gophers/<version>
  • Fetched node exporter:
    • GOPATH=/root/gophers/gopath110 /root/gophers/go110/bin/go get github.com/prometheus/node_exporter
    • GOPATH=/root/gophers/gopath195 /root/gophers/go195/bin/go get github.com/prometheus/node_exporter
  • Found all the commit hashes of any commit between 0.15.2 and 0.16.0-rc.2 that touches a .go file: git log --pretty=format:%H v0.15.2..v0.16.0-rc.2 --no-merges -- '*.go' > /root/gophers/commits.txt
  • Build a node exporter for every commit for both versions of Go:
    #!/usr/bin/env bash
    
    filename="/root/gophers/commits.txt"
    declare -i offset
    offset=0
    
    while read commit; do
            cd /root/gophers/gopath195/src/github.com/prometheus/node_exporter
            git checkout $commit
            echo "Building $commit with Go 1.9.5"
            GOPATH=/root/gophers/gopath195 /root/gophers/go195/bin/go get ./...
            GOPATH=/root/gophers/gopath195 /root/gophers/go195/bin/go build -o "/root/gophers/builds/195/node_exporter_195_$(printf "%02d" $offset)_$commit"
            offset+=1
            echo "Done building $commit with Go 1.9.5"
    done < $filename
    
  • Repeat above for Go 1.10.1
  • Create systemd files for all builds, write out targets for vegeta and start them:
    #!/usr/bin/env bash
    
    declare -i port
    port=9100
    
    for n in /root/gophers/builds/195/* ; do
            short=$(echo $n | cut -d '/' -f6 | cut -d '_' -f1-4)
            cat <<-EOF > /etc/systemd/system/$short.service
    [Unit]
    Description=Prometheus Node Exporter $port
    
    [Service]
    User=root
    ExecStart=$n --log.level="debug" --web.listen-address="127.0.0.1:$port"
    Restart=never
    EOF
            echo "GET http://127.0.0.1:$port/metrics" >> /root/gophers/targets.txt
            port+=1
    done
    
    systemctl daemon-reload
    
    for n in /etc/systemd/system/node_exporter_195_*; do
      systemctl start $(basename $n)
    done
    
  • Repeat above for Go 1.10.1
  • Launch vegeta against them: ./vegeta attack -targets=targets.txt -rate=68 > results.bin

And now we wait.