celestia-node: shrex-eds/server: writing ODS bytes produces errors

Celestia Node version

https://github.com/celestiaorg/celestia-node/releases/tag/v0.9.0-rc1

OS

ubuntu/docker

Install tools

  • docker pull
  • native installation through Makefile

Others

Reproducible on

  • v0.8.x releases
  • Full Node types are affected, too

Steps to reproduce it

  1. Initialise a bridge node for blockspacerace
  2. Start

Expected result

  • shrex-eds is working with no errors

Actual result

  • shrex-eds is producing errors during writing events

Relevant log output

2023-04-14T09:18:43.505Z	ERROR	shrex-eds	shrexeds/server.go:106	server: writing ods to stream	{"hash": "7252060965D5491F7D54661EE2B1B05829031F2BBB7A76640B7C995652928BCB", "err": "writing ODS bytes: stream reset"}
2023-04-14T09:18:43.506Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.506Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.506Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.506Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.506Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.507Z	ERROR	shrex-eds	shrexeds/server.go:93	server: writing status to stream	{"err": "stream reset"}
2023-04-14T09:18:43.508Z	ERROR	shrex-eds	shrexeds/server.go:113	server: closing stream	{"err": "stream reset"}

Notes

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 3
  • Comments: 20 (11 by maintainers)

Most upvoted comments

Our node crashed last night, here are the logs (1.2G) that i get from journalctl before the the crash (journalctl --boot=-1 > crash.log): https://transfer.sh/iqs1do/celestia-crash.log

What happened:

  • i was notified that we were loosing uptime
  • i could not ssh in or even ping
  • i restarted through the server provider’s panel
  • i was able to ping and ssh in
  • server came up with 130G disk space available, so it’s not the same issue as last time
  • it started syncing missing blocks automatically and is up and running

last lines of the log:

Apr 13 23:51:04 celestia-mamaki-validator01 celestia[17134]: 2023-04-13T23:51:04.623+0200        ERROR        shrex/middleware        p2p/middleware.go:23        server: closing stream        {"err": "stream reset"}
Apr 13 23:51:04 celestia-mamaki-validator01 celestia[17134]: 2023-04-13T23:51:04.624+0200        ERROR        shrex/middleware        p2p/middleware.go:23        server: closing stream        {"err": "stream reset"}
Apr 13 23:51:04 celestia-mamaki-validator01 celestia[17134]: 2023-04-13T23:51:04.624+0200        ERROR        shrex/middleware        p2p/middleware.go:23        server: closing stream        {"err": "stream reset"}
Apr 13 23:51:10 celestia-mamaki-validator01 celestia-appd[906]: 11:51PM INF Timed out dur=9864.526978 height=242627 module=consensus round=0 step=1

Note: this happens in the previous versions as well and can be seen eventually while running a full node

On my side the crash happened again too, here’s a new log, i’ll upload to wetransfer if @Bidon15 can’t download it https://transfer.sh/Wi2Xl1/celestia-crash.log

On v0.9.3 I still get kernel freezes of my VM.

Kernel: 5.19.0-41-generic #42-Ubuntu SMP PREEMPT_DYNAMIC

 kernel BUG at net/core/skbuff.c:4472!

This bug is really ugly. Before, when I ran the node in a linux container (LXC), it killed the entire Proxmox hypervisor kernel. Now in the VM it only affects the VM kernel. Therefore, you want to isolate the celestia-node as good as possible currently.

Well that’s strange, the download works on my side, i’ll try to upload it to another platform asap

@Wondertan We had the same crash again today. Could not ping or ssh in, restarted the server through the console and everything works after the reboot. Here’s the new journal log: https://transfer.sh/m9XMdB/celestia-crash.log

Thanks for the input @pciavald. Can you put the bridge node to produce less logs from shrex? You can run this with flag --log.level.module "shrex/middleware:fatal,shrex/eds:fatal"

celestia bridge start --p2p.network blockspacerace --log.level.module "shrex/middleware:fatal,shrex/eds:fatal" #+typical flags

If the panic happens again, we at least can see clearly the msg in the stack trace or logs🤞

@Wondertan We had the same crash again today. Could not ping or ssh in, restarted the server through the console and everything works after the reboot. Here’s the new journal log: https://transfer.sh/m9XMdB/celestia-crash.log