bleve: Scorch persister stuck after segment creation fails?
On low memory devices, we see frequent memory allocation errors when submitting indexing batches to scorch:
2019/11/03 08:18:50 [DVR] Fetching guide data for 262 stations in USA-WI48449-X @ 2019-11-18 2:00PM 2019/11/03 08:19:02 [DVR] indexed 1761 airings (247 channels) [1s fetch, 10s index] 2019/11/03 08:19:03 [DVR] indexed 112 movies (35 channels) [0s fetch, 0s index] 2019/11/03 08:19:03 [DVR] Fetching guide data for 262 stations in USA-WI48449-X @ 2019-11-18 8:00PM 2019/11/03 08:19:07 [DVR] Error indexing airings: error opening new segment at USA-WI48449-X.airings/store/00000000668c.zap, cannot allocate memory 2019/11/03 08:19:08 [DVR] Fetching guide data for 262 stations in USA-WI48449-X @ 2019-11-19 2:00AM 2019/11/03 08:19:12 [DVR] Error indexing airings: cannot allocate memory 2019/11/03 08:19:26 [IDX] Pruned 3839 expired airings from USA-WI48449-X in 10.116005734s.
Then, a few hours later we experience a full crash of the process. The stack shows that the persister is sitting inside pausePersisterForMergerCatchUp() and the Batch is still stuck in prepareSegment()
runtime/cgo: pthread_create failed: Resource temporarily unavailable
SIGABRT: abort
PC=0xb68bd6 m=2 sigcode=4294967290
goroutine 0 [idle]:
runtime: unknown pc 0xb68bd6
stack: frame={sp:0x66c9a87c, fp:0x0} stack=[0x6649b034,0x66c9ac34)
66c9a7fc: 00b6bcf1 00000000 00000000 00000000
66c9a80c: 00000000 00000000 00000000 00000000
66c9a81c: 00000000 00000000 00000000 00000000
66c9a82c: 00000000 00000000 00000000 00000000
66c9a83c: 00000000 00000000 00000000 00000000
66c9a84c: 00000000 00000000 00000000 00000000
66c9a85c: 00000000 00000000 00000000 00000000
66c9a86c: 01afd358 66c9b300 00000001 01afd358
66c9a87c: <01afd358 00b6bc7b 01afd358 66c9b300
66c9a88c: 00b6be9b 00000000 00000000 00000000
66c9a89c: 00000000 66c9a8cd 010d770c 00000000
66c9a8ac: 010cb153 66c90043 00b698ef 010cb120
66c9a8bc: 00000005 4d5f434c 41535345 2f534547
66c9a8cc: 6362696c 006f6d2e <github.com/blevesearch/bleve/index/scorch/segment/zap.mergeStoredAndRemap.func2+802> 00b6989d 00000000
66c9a8dc: 00000000 00000043 00000000 00000000
66c9a8ec: 00000000 00000000 00000000 00000000
runtime: unknown pc 0xb68bd6
stack: frame={sp:0x66c9a87c, fp:0x0} stack=[0x6649b034,0x66c9ac34)
66c9a7fc: 00b6bcf1 00000000 00000000 00000000
66c9a80c: 00000000 00000000 00000000 00000000
66c9a81c: 00000000 00000000 00000000 00000000
66c9a82c: 00000000 00000000 00000000 00000000
66c9a83c: 00000000 00000000 00000000 00000000
66c9a84c: 00000000 00000000 00000000 00000000
66c9a85c: 00000000 00000000 00000000 00000000
66c9a86c: 01afd358 66c9b300 00000001 01afd358
66c9a87c: <01afd358 00b6bc7b 01afd358 66c9b300
66c9a88c: 00b6be9b 00000000 00000000 00000000
66c9a89c: 00000000 66c9a8cd 010d770c 00000000
66c9a8ac: 010cb153 66c90043 00b698ef 010cb120
66c9a8bc: 00000005 4d5f434c 41535345 2f534547
66c9a8cc: 6362696c 006f6d2e <github.com/blevesearch/bleve/index/scorch/segment/zap.mergeStoredAndRemap.func2+802> 00b6989d 00000000
66c9a8dc: 00000000 00000043 00000000 00000000
66c9a8ec: 00000000 00000000 00000000 00000000
goroutine 16 [chan receive, 519 minutes]:
github.com/blevesearch/bleve/index/scorch.(*Scorch).prepareSegment(0x25c4a80, 0x1047268, 0x2bbe500, 0x336e000, 0xff, 0x100, 0x2620e40, 0x0, 0x0, 0x0)
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index/scorch/scorch.go:425 +0x3e4
github.com/blevesearch/bleve/index/scorch.(*Scorch).Batch(0x25c4a80, 0x30acac0, 0x0, 0x0)
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index/scorch/scorch.go:361 +0x6ec
github.com/blevesearch/bleve.(*indexImpl).Batch(0x28fcdc0, 0x2620e60, 0x0, 0x0)
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index_impl.go:310 +0x94
github.com/fancybits/channels-server/dvr.(*Recorder).indexAirings(0x25ce8c0, 0x29e5490, 0xd, 0x30560a0, 0x309fd01, 0x0, 0x0, 0xd55233f0, 0xe, 0x1ae94c0, ...)
goroutine 50 [select, 2006 minutes]:
github.com/blevesearch/bleve/index/scorch.(*Scorch).pausePersisterForMergerCatchUp(0x25c4a80, 0x1310, 0x0, 0x112a, 0x0, 0x0, 0x0, 0x0, 0x2ac21b0, 0x112a, ...)
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index/scorch/persister.go:295 +0x2d0
github.com/blevesearch/bleve/index/scorch.(*Scorch).persisterLoop(0x25c4a80)
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index/scorch/persister.go:117 +0x5f4
created by github.com/blevesearch/bleve/index/scorch.(*Scorch).Open
github.com/blevesearch/bleve@v0.8.2-0.20191010234049-157461a2aeb6/index/scorch/scorch.go:170 +0xc8
...
trap 0x6
error 0x0
oldmask 0x0
r0 0x0
r1 0x7fcd
r2 0x6
r3 0x7fcd
r4 0x6
r5 0x66c9b7c0
r6 0x2
r7 0x10c
r8 0x1
r9 0xe0
r10 0x2400540
fp 0x66c9a9bc
ip 0x10c
sp 0x66c9a87c
lr 0xb6bc7b
pc 0xb68bd6
cpsr 0x20000030
fault 0x0
About this issue
- Original URL
- State: open
- Created 5 years ago
- Comments: 15 (2 by maintainers)
Commits related to this issue
- avoid writing nil entry to newSegments on zap.Open failures /cc #1305 — committed to fancybits/bleve by tmm1 5 years ago
I think we had identified some paths that have this exact behavior, some error happens that is essentially unrecoverable by scorch, and scorch will just retry it indefinitely. I don’t remember the details, but I think it was slightly less trivial to fix because some of the components are decoupled and it makes the problem harder to detect.