go-ceph: crash when call read on rados object
Hi, I setup a sample ceph cluster using ceph-daemon docker image. then creating a bucket and sending a large file to that bucket, now i want to read one of its corresponding rados objects, the size is 4MB
ioCtx, err := conn.OpenIOContext("default.rgw.buckets.data")
if err != nil {
panic(err)
}
buffer := make([]byte, 5000000)
readLen, err := ioCtx.Read(objID, buffer, uint64(0))
if err != nil {
panic(err)
}
this code some times stuck and some times crash
/build/ceph-h4nNuL/ceph-15.2.13/src/osdc/Objecter.cc: In function 'void Objecter::_op_submit_with_budget(Objecter::Op*, Objecter::shunique_lock&, ceph_tid_t*, int*)' thread 7f78cf93b380 time 2021-09-06T19:23:13.418088+0430
/build/ceph-h4nNuL/ceph-15.2.13/src/osdc/Objecter.cc: 2277: FAILED ceph_assert(initialized)
ceph version 15.2.13 (c44bc49e7a57a87d84dfff2a077a2058aa2172e2) octopus (stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x155) [0x7f78d04a1113]
2: (()+0x26531b) [0x7f78d04a131b]
3: (()+0xc23f6) [0x7f78d919d3f6]
4: (()+0xc24b3) [0x7f78d919d4b3]
5: (()+0x88e6b) [0x7f78d9163e6b]
6: (()+0x8b033) [0x7f78d9166033]
7: (rados_read()+0xdf) [0x7f78d9121bff]
8: (_cgo_0db27397a8d2_Cfunc_rados_read()+0x2b) [0x4b88bb]
9: ./go-librados-trining() [0x46ae50]
SIGABRT: abort
PC=0x7f78d8f0c18b m=0 sigcode=18446744073709551610
goroutine 0 [idle]:
runtime: unknown pc 0x7f78d8f0c18b
stack: frame={sp:0x7fff7bb50c80, fp:0x0} stack=[0x7fff7b352a38,0x7fff7bb51a70)
00007fff7bb50b80: 0000000000000000 00007fff7bb50c88
00007fff7bb50b90: 0000000000000051 00007f78d010f6f8
00007fff7bb50ba0: 00007fff7bb50e70 00007f78d0104b39
00007fff7bb50bb0: 00007fff7bb50e60 00007fff7bb50de0
00007fff7bb50bc0: 000000000185e7a0 000000000185e7c7
00007fff7bb50bd0: 0000000000000011 000000000185e7c7
00007fff7bb50be0: 00007fff7bb501e0 0000004e0000000d
00007fff7bb50bf0: 00007fff7bb500a0 0000002700000003
00007fff7bb50c00: 00007fff7bb50200 000000000000000d
00007fff7bb50c10: 0000000000000000 00007f78d019e286
00007fff7bb50c20: 5f5f3a3a68706563 7373615f68706563
00007fff7bb50c30: 6c6961665f747265 6f63207261686328
00007fff7bb50c40: 6863202c2a74736e 74736e6f63207261
00007fff7bb50c50: 202c746e69202c2a 6e6f632072616863
00007fff7bb50c60: 00007f00292a7473 00007f78d06dc301
00007fff7bb50c70: 00007fff7bb50ce0 00007fff7bb50e70
00007fff7bb50c80: <0000000000000000 00007f78d0233d18
00007fff7bb50c90: 000000000184db80 000000000184db80
00007fff7bb50ca0: 000000000184db80 000000000184db70
00007fff7bb50cb0: 0000000000000201 00007f78d019e18d
00007fff7bb50cc0: 00007fff7bb50d30 00007f78d0104b39
00007fff7bb50cd0: 00007fff7bb50d20 00007fff7bb510a8
00007fff7bb50ce0: 0000000000000035 00007f78d0192fd7
00007fff7bb50cf0: 00007fff7bb50d20 00007f78d019bb28
00007fff7bb50d00: fffffffe7fffffff ffffffffffffffff
00007fff7bb50d10: ffffffffffffffff ffffffffffffffff
00007fff7bb50d20: ffffffffffffffff ffffffffffffffff
00007fff7bb50d30: ffffffffffffffff ffffffffffffffff
00007fff7bb50d40: ffffffffffffffff ffffffffffffffff
00007fff7bb50d50: ffffffffffffffff ffffffffffffffff
00007fff7bb50d60: ffffffffffffffff ffffffffffffffff
00007fff7bb50d70: ffffffffffffffff ffffffffffffffff
runtime: unknown pc 0x7f78d8f0c18b
stack: frame={sp:0x7fff7bb50c80, fp:0x0} stack=[0x7fff7b352a38,0x7fff7bb51a70)
00007fff7bb50b80: 0000000000000000 00007fff7bb50c88
00007fff7bb50b90: 0000000000000051 00007f78d010f6f8
00007fff7bb50ba0: 00007fff7bb50e70 00007f78d0104b39
00007fff7bb50bb0: 00007fff7bb50e60 00007fff7bb50de0
00007fff7bb50bc0: 000000000185e7a0 000000000185e7c7
00007fff7bb50bd0: 0000000000000011 000000000185e7c7
00007fff7bb50be0: 00007fff7bb501e0 0000004e0000000d
00007fff7bb50bf0: 00007fff7bb500a0 0000002700000003
00007fff7bb50c00: 00007fff7bb50200 000000000000000d
00007fff7bb50c10: 0000000000000000 00007f78d019e286
00007fff7bb50c20: 5f5f3a3a68706563 7373615f68706563
00007fff7bb50c30: 6c6961665f747265 6f63207261686328
00007fff7bb50c40: 6863202c2a74736e 74736e6f63207261
00007fff7bb50c50: 202c746e69202c2a 6e6f632072616863
00007fff7bb50c60: 00007f00292a7473 00007f78d06dc301
00007fff7bb50c70: 00007fff7bb50ce0 00007fff7bb50e70
00007fff7bb50c80: <0000000000000000 00007f78d0233d18
00007fff7bb50c90: 000000000184db80 000000000184db80
00007fff7bb50ca0: 000000000184db80 000000000184db70
00007fff7bb50cb0: 0000000000000201 00007f78d019e18d
00007fff7bb50cc0: 00007fff7bb50d30 00007f78d0104b39
00007fff7bb50cd0: 00007fff7bb50d20 00007fff7bb510a8
00007fff7bb50ce0: 0000000000000035 00007f78d0192fd7
00007fff7bb50cf0: 00007fff7bb50d20 00007f78d019bb28
00007fff7bb50d00: fffffffe7fffffff ffffffffffffffff
00007fff7bb50d10: ffffffffffffffff ffffffffffffffff
00007fff7bb50d20: ffffffffffffffff ffffffffffffffff
00007fff7bb50d30: ffffffffffffffff ffffffffffffffff
00007fff7bb50d40: ffffffffffffffff ffffffffffffffff
00007fff7bb50d50: ffffffffffffffff ffffffffffffffff
00007fff7bb50d60: ffffffffffffffff ffffffffffffffff
00007fff7bb50d70: ffffffffffffffff ffffffffffffffff
goroutine 1 [syscall]:
runtime.cgocall(0x4b8890, 0xc0000cfe50, 0x0)
/usr/local/go/src/runtime/cgocall.go:154 +0x5b fp=0xc0000cfe20 sp=0xc0000cfde8 pc=0x408c3b
github.com/ceph/go-ceph/rados._Cfunc_rados_read(0x1770210, 0x1841ba0, 0xc000100000, 0x4c4b40, 0x0, 0x0)
_cgo_gotypes.go:1132 +0x48 fp=0xc0000cfe50 sp=0xc0000cfe20 pc=0x4afc08
github.com/ceph/go-ceph/rados.(*IOContext).Read.func2(0xc0000b6020, 0x1841ba0, 0xc000100000, 0xc000100000, 0x4c4b40, 0x4c4b40, 0x0, 0x0)
/home/mfs/workspace/projects/go-librados-trining/vendor/github.com/ceph/go-ceph/rados/ioctx.go:198 +0x85 fp=0xc0000cfe98 sp=0xc0000cfe50 pc=0x4b0845
github.com/ceph/go-ceph/rados.(*IOContext).Read(0xc0000b6020, 0x4ec0d4, 0x5e, 0xc000100000, 0x4c4b40, 0x4c4b40, 0x0, 0x0, 0x0, 0x0)
/home/mfs/workspace/projects/go-librados-trining/vendor/github.com/ceph/go-ceph/rados/ioctx.go:198 +0xe9 fp=0xc0000cff00 sp=0xc0000cfe98 pc=0x4b01a9
main.main()
/home/mfs/workspace/projects/go-librados-trining/test.go:33 +0x1d1 fp=0xc0000cff88 sp=0xc0000cff00 pc=0x4b7cb1
runtime.main()
/usr/local/go/src/runtime/proc.go:225 +0x256 fp=0xc0000cffe0 sp=0xc0000cff88 pc=0x43bd56
runtime.goexit()
/usr/local/go/src/runtime/asm_amd64.s:1371 +0x1 fp=0xc0000cffe8 sp=0xc0000cffe0 pc=0x46b1e1
rax 0x0
rbx 0x7f78cf93b380
rcx 0x7f78d8f0c18b
rdx 0x0
rdi 0x2
rsi 0x7fff7bb50c80
rbp 0x7fff7bb51220
rsp 0x7fff7bb50c80
r8 0x0
r9 0x7fff7bb50c80
r10 0x8
r11 0x246
r12 0x7fff7bb50ef0
r13 0x7fff7bb50f20
r14 0x7fff7bb510a0
r15 0x7f78d91e5cec
rip 0x7f78d8f0c18b
rflags 0x246
cs 0x33
fs 0x0
gs 0x0
I write an equivalent code in python and it works without any issue.
import rados
cluster = rados.Rados(conffile="/etc/ceph/ceph.conf")
cluster.connect()
ioctx = cluster.open_ioctx("default.rgw.buckets.data")
ret = ioctx.read(
key=objID,
length=5000000,
offset=0,
)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 34
Yeah, I’m just starting down that path. I’ve confirmed that reverting this commit on top of v0.16.0 also makes the problem go away.
Just a:
I’m running against a Nautilus cluster on physical hardware (3 physical servers, 10GB network, etc) and running the crashing Go program one one of those three machines.
I haven’t tried any of your test containers but can look when I next get a chance.
I tried building with ASAN (new in Go 1.18) but it didn’t report anything.
I also tried sprinkling some
runtime.LockOSThread
calls in places, suspecting there might be some thread-local state in use by the C libraries, but that didn’t help either.