bee: /chunks/xxxxxxx?targets= causes crash

Summary

Running a v1.0.0 bee node with global-pinning-enable: true. Requested a chunk with targets that the node did not have with:

curl http://192.168.10.185:8080/chunks/5e7132318b6f5482fb02e2ed48c293800f0774470387412fbd3db1a3366736bd?targets=f99db6

this returned the expected:

{"message":"chunk recovery initiated. retry after sometime.","code":202}

At this point, the bee node consumes copious amounts of CPU and then abruptly terminates with:

[signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0xe012ab]

goroutine 129928953 [running]:
github.com/ethersphere/bee/pkg/pss.(*pss).Send(0xc0006d70e0, 0x18e53e0, 0xc00003e0a0, 0xb848d6c7d44639f3, 0xd02c4e295a5b9716, 0x776eafbe3b931776, 0xcf20d5397b98a205, 0xc005b29f60, 0x20, 0x160, ...)
        github.com/ethersphere/bee/pkg/pss/pss.go:98 +0x14b
github.com/ethersphere/bee/pkg/recovery.NewCallback.func1(0xc005b29f60, 0x20, 0x160, 0xc109039d00, 0x1, 0x1)
        github.com/ethersphere/bee/pkg/recovery/repair.go:38 +0xd1
created by github.com/ethersphere/bee/pkg/netstore.(*store).Get
        github.com/ethersphere/bee/pkg/netstore/netstore.go:56 +0x2ec

Steps to reproduce

Run a bee node with global-pinning-enable: true and request a /chunks/xxxxxx?targets=yyyyyy where yyyyyy must be an even count of hex digits. Wait for the response and then wait even longer for the crash to occur. I believe the additional delay is caused by pss.Wrap inside pss.Send attempting to target the appropriate neighborhood.

Expected behavior

I would expect the recovery request to be issued successfully via pss.

Actual behavior

The application crashed.

I have tracked it down through the source code and it seems that pss.Send now requires a stamper, but the recovery repair callback specifies null. As soon as the stamper is used in pss.Send, the application terminates.

https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/pss/pss.go#L98

https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/recovery/repair.go#L38

https://github.com/ethersphere/bee/blob/6f3a38292e5e0fa3cd4856bf7dc9f5ba27c6c293/pkg/netstore/netstore.go#L57

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

@ldeffenb this is not so high up on our priority list. there are some usability problems with the current implementation of global pinning that are a bit difficult to solve right now, and after an in-depth look they might even require protocol changes on some degree so that good UX can ensue.

@acud As @ldeffenb mentioned, the recovery.NewCallback function that the /chunks api uses sends a PSS message with the RECOVERY topic to retrieve the chunk from the network. So it needs to stamp the payload before doing so, but a nil stamper is currently passed in. In order to stamp, we need a valid batch ID.

Is the correct implementation reading the batch ID from the request header?

I will look into the segfault, but please keep in mind that targets specifies the preceding address bytes that the node will trying to mine. If you specify 1 byte (two hex characters) - it means that it will mine a chunk address with proximity order 16 to the specified target prefix; if you specify 2 bytes you’re already at 32, and you are trying to mine an address with po 48, i.e. you’re going to have to go through 281474976710656 chunks and hash them in order to get an address at po 48 (if you’re lucky!)