go-spacemesh: Smeshing fails on restarting Node

Description

Smeshing fails after a successful setup on the next Node run.

After smeshing is set up the Node creates postdata_0.bin and postdata_metadata.json in the directory that the User defined as a smeshing-datadir. On the next run Node tries to find key.bin in this directory instead of the previously used default location ~/post/data/key.bin. As a consequence, it can’t find it and creates a new key.bin there with the different (new) Node ID. Due to the new Node ID, it fails:

failed to complete post setup	{"node_id": "7ca3802629e39fe48e334e104b84590683db25dfe639afd92615719e6084dfd2", "module": "atxBuilder", "errmsg": "`ID` config mismatch; expected: 7ca3802629e39fe48e334e104b84590683db25dfe639afd92615719e6084dfd2, found: 8d46e54777dd4d60beeca9fd02b3ab1a5c2797af43d801844a3b22c71cc075a0, datadir: /some/user/defined/path/for/post/data", "name": "atxBuilder"}

Then if the User cleans up the defined post directory (or left only key.bin there) it will work fine.

Steps to reproduce

  1. Run the Node from scratch using some node-config.json without smeshing props
  2. Run smrepl --server localhost:9092
  3. Run post setup. Follow the steps, specify a non-default directory for the post data.
  4. Put the new section smeshing in the node-config
  5. Restart Node process

Actual Behavior

Node isn’t Smeshing. Post data “corrupted”. The Node log (related parts):

2021-10-04T20:27:01.205+0300	INFO	00000.defaultLogger	App version: v0.2.2-beta.1. Git: 38056f5-dirty - 38056f59d331e35e1e576c12040f522c610aff35 . Go Version: go1.15.13. OS: darwin-amd64 
...
2021-10-04T20:27:02.065+0300	INFO	starting spacemesh	{"data-dir": "/Users/brusher/Library/Application Support/Electron/node-data/205", "post-dir": "/Users/brusher/spacemesh", "hostname": "Kirill-557.local", "name": ""}
2021-10-04T20:27:02.065+0300	INFO	00000.defaultLogger	Looking for identity file at `/Users/brusher/spacemesh/key.bin`
2021-10-04T20:27:02.065+0300	INFO	00000.defaultLogger	Identity file not found. Creating new identity...
2021-10-04T20:27:02.066+0300	INFO	00000.defaultLogger	created new identity	{"public_key": "29137", "name": ""}
...
2021-10-04T20:27:02.399+0300	INFO	29137.clock        	started notifying	{"node_id": "29137508efea26a1777a84c7d13f53f7d287b8cb3cacf789c09c3a3d5dc2dd9e"}
2021-10-04T20:27:02.399+0300	INFO	29137.post         	post setup session starting	{"node_id": "29137508efea26a1777a84c7d13f53f7d287b8cb3cacf789c09c3a3d5dc2dd9e", "module": "post", "data_dir": "/Users/brusher/spacemesh", "num_units": "4", "labels_per_unit": "1024", "bits_per_label": "8", "provider": "1", "name": "post"}
...
2021-10-04T20:27:02.400+0300	INFO	00000.defaultLogger	starting new grpc server on :9092
2021-10-04T20:27:02.400+0300	ERROR	29137.atxBuilder   	failed to complete post setup	{"node_id": "29137508efea26a1777a84c7d13f53f7d287b8cb3cacf789c09c3a3d5dc2dd9e", "module": "atxBuilder", "errmsg": "`ID` config mismatch; expected: 29137508efea26a1777a84c7d13f53f7d287b8cb3cacf789c09c3a3d5dc2dd9e, found: 6859a5fe4f53226bd43c607053dcd62cb3bdf2eda5d9b01b4f1de6d1715d4e05, datadir: /Users/brusher/spacemesh", "name": "atxBuilder"}
...

Expected Behavior

Smeshing works well on the second run without any tricks to make it work.

There we should think about how it should work:

  • The NodeID / SmesherID should be the same and should be stored in some default location only (so we don’t need to store it in post data dir and worry that someone can delete it). In case that we do not assume that NodeID and reward address should change depending on the open wallet — I suggest storing this file in a single and even more secure place than ~/post/data/key.bin. E.G. %APPDATA%/spacemesh/key.bin (~/Libraries/Application Support/spacemesh/key.bin on macOS, ~/AppData/Roaming/spacemesh/key.bin, ~/.config on Linux) I think this is the best option, but I can miss something.
  • The key.bin file should be copied to the post data-dir right on the setting up smeshing.

Environment

macOS 10.13.6 go-spacemesh v.0.2.2-beta1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 30 (30 by maintainers)

Most upvoted comments

💯

I’ll add creating an issue for go-spacemesh to my todo list.

Cool. Let’s summarize:

  1. Make a temporary kludge in Smapp (urgent: https://github.com/spacemeshos/smapp/issues/823 ):
    • Before starting smeshing Smapp will check for the existence of the key.bin file in the specified directory:
      • If it is not there — just copy it from the default path (~/post/data/key.bin)
      • If it is there — check that this is the same file as in the default path (~/post/data/key.bin) using shasum. If it differs — update the config and restart the node. If it is the same — just call StartSmeshing.
      • If ~/post/data/key.bin not found — update the config and restart the node as well
  2. Make a robust fix in go-spacemesh to accept requirements: https://github.com/spacemeshos/go-spacemesh/issues/2858#issuecomment-934164646

If it sounds good, I’ll create an issue for Smapp and then paste a link here. About the issue related to go-sm, I propose to summarize everything in the new issue, post the link here and close this one 😃

@brusherru I suggested to do check first if the datadir changed…

The “kludge” you suggested is perfect. Then we don’t need a short term fix in the node.

I had a chat with @noamnelke about this. There are multiple considerations here:

  1. High-level and how things should work:
  • SmesherId/NodeId should really be called PostId and should be renamed to users as such in clients and dash/explore. The current name is misleading and was born out of historical misuse of this id (see bullet 2 below).
  • It shouldn’t exist before the user inits a post.
  • It should not change while the the node is smeshing across node’s session (user didn’t delete the post data).
  • It should not change if user resizes an existing post.
  • It should not change if user stopped smeshing and started smeshing again later using same post data.
  • It is the context that post data, atxs and rewards are associated with.
  1. Currently and for a long time this id has been misused by the node to be used as a unique node id in logs. There should be another random id generated on node first runtime session that should identify the node in the logs and in metrics independent of PostId.
  2. The issue reported here is due to a bug related to the NodeId that will be patched soon so we can complete gpu-post basic flow for smapp 0.2. There’s also a longer-term solution which involves the creation of an independent node id (bullet 2 above).

@noamnelke - please review this summary