influxdb: InfluxDB getting restarted frequently on v1.7.9 with segmentation fault

InfluxDB is frequently getting restarted due to segmentation fault.

Environment info:

  • Linux 5.4.0-1045-aws x86_64
  • Using 32 core machine with 256 GB RAM

Logs:

influxd[27351]: ts=2024-02-27T03:20:08.429986Z lvl=info msg="Full compaction complete" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L2-00000011.tsi elapsed=880ms bytes=9201143 kb_per_sec=10210
influxd[27351]: ts=2024-02-27T03:20:08.430010Z lvl=info msg="Removing index file" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L1-00000009.tsi
influxd[27351]: ts=2024-02-27T03:20:08.431343Z lvl=info msg="Removing index file" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 path=/var/lib/influxdb/data/telegraf/measurements/44101/index/7/L1-00000006.tsi
influxd[27351]: ts=2024-02-27T03:20:08.432624Z lvl=info msg="TSI level compaction (end)" log_id=0n_OznJG000 index=tsi tsi1_partition=8 trace_id=0nb94CVG000 op_name=tsi1_compact_to_level tsi1_level=2 op_event=end op_elapsed=882.691ms
influxd[27351]: ts=2024-02-27T03:20:08.433600Z lvl=info msg="Executing query" log_id=0n_OznJG000 service=query query="SELECT sum(value) FROM measurement WHERE <conditions> GROUP BY <column>"
influxd[27351]: ts=2024-02-27T03:20:08.438136Z lvl=info msg="Snapshot for path written" log_id=0n_OznJG000 engine=tsm1 trace_id=0nb947Sl000 op_name=tsm1_cache_snapshot path=/var/lib/influxdb/data/telegraf/measurements/44101 duration=2178.316ms
influxd[27351]: ts=2024-02-27T03:20:08.438168Z lvl=info msg="Cache snapshot (end)" log_id=0n_OznJG000 engine=tsm1 trace_id=0nb947Sl000 op_name=tsm1_cache_snapshot op_event=end op_elapsed=2178.346ms
influxd[27351]: unexpected fault address 0x7f60b5dab63d
influxd[27351]: fatal error: fault
influxd[27351]: [signal SIGSEGV: segmentation violation code=0x1 addr=0x7f60b5dab63d pc=0x9d3f7e]
influxd[27351]: goroutine 5078661668 [running]:
influxd[27351]: runtime.throw(0x151e28d, 0x5)
influxd[27351]: #011/usr/local/go/src/runtime/panic.go:617 +0x72 fp=0xc029af9a70 sp=0xc029af9a40 pc=0x42f482
influxd[27351]: runtime.sigpanic()
influxd[27351]: #011/usr/local/go/src/runtime/signal_unix.go:397 +0x401 fp=0xc029af9aa0 sp=0xc029af9a70 pc=0x444731
influxd[27351]: github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring.(*shortIterator).next(0xc99bca3720, 0x56)
influxd[27351]: #011/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring/shortiterator.go:18 +0x1e fp=0xc029af9ab0 sp=0xc029af9aa0 pc=0x9d3f7e
influxd[27351]: github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring.(*intIterator).Next(0xc9eb58ac60, 0x414a01)
influxd[27351]: #011/go/src/github.com/influxdata/influxdb/vendor/github.com/influxdata/roaring/roaring.go:239 +0x34 fp=0xc029af9ad8 sp=0xc029af9ab0 pc=0x9b0ac4
influxd[27351]: github.com/influxdata/influxdb/tsdb.(*seriesIDSetIterator).Next(0xc359829da0, 0xc99bca3701, 0x2c000000000000ff, 0x0, 0x2c1af81b11cf2772, 0xc289867bc0)

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

@satish2007 - Sometimes deleting the _series directories and restarting InfluxDB can help; they are automatically regenerated if missing on startup.

The fix is also present in the 1.11.5 tag, but you will have to build that from source yourself; we do not have binaries available for it.

@Satish2007 - A longer stacktrace from the panic would be the only way to know what is happening. Everything printed from the first Go routine.