agoric-sdk: Bad address - cannot read snapshot for v10:zoe - KERNEL PANIC

reported in https://discord.com/channels/585576150827532298/819073555446759444/880169455412457602 and https://discord.com/channels/585576150827532298/819073555446759444/880176218673131570

Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:16.617Z launch-chain: Launching SwingSet kernel
Aug 25 21:18:16 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Prometheus scrape endpoint: http://0.0.0.0:9464/metrics
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614214]: cannot read snapshot /root/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-qSM9Ip.xss: Bad address
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: 2021-08-25T19:18:40.070Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: portHandler threw (ExitCode#1)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: ExitCode#1: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at new ErrorCode (packages/xsnap/api.js:49:5)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.<anonymous> (packages/xsnap/src/xsnap.js:124:22)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]:   at ChildProcess.emit (events.js:400:28)
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal ag-chain-cosmos[614134]: Cannot initialize Controller ExitCode: v10:zoe exited: I/O error
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Main process exited, code=exited, status=1/FAILURE
Aug 25 21:18:40 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Failed with result 'exit-code'.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: ag-chain-cosmos.service: Scheduled restart job, restart counter is at 2.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Stopped Agoric Cosmos daemon.
Aug 25 21:18:43 Ubuntu-2004-focal-64-minimal systemd[1]: Started Agoric Cosmos daemon.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 37 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I can reproduce the symptoms by trying to load the snapshot into one of our tools:

connolly@jambox:~/projects/agoric/agoric-sdk/packages/xsnap$ ./moddable/build/bin/lin/release/xsnap -r ~/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss 
cannot read snapshot /home/connolly/Downloads/8bae75381c20d536812b972f61b52ae4f8ed4a83ad293070bf8a57e7f87d4e0c-load-7IUrAb.xss: Bad address

I’m struggling to come up with a more detailed diagnosis. I have reached out to our collaborators at Moddable for help.

p.s. @warner it does not look like a case of deleting a snapshot too early. The compressed snapshot is there in the contributed diagnostic materials and the uncompressed snapshot.

It’s a little interesting that we don’t delete the uncompressed snapshot in this error case. I don’t think that was by design, but it’s somewhat fortunate in this case.

Thanks. It looks like I have a couple full node state backups now.

jupyter@slog45nb:~$ ls -lR dx-collect/33-panic/
dx-collect/33-panic/:
total 8
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 20:00 Syd-ai
drwxr-xr-x 2 jupyter jupyter 4096 Aug 27 18:29 humantraffic

dx-collect/33-panic/Syd-ai:
total 13578524
-rw-r--r-- 1 jupyter jupyter 13904400441 Aug 27 19:58 ag-chain-cosmos-SYD.zip

dx-collect/33-panic/humantraffic:
total 12596552
-rw-r--r-- 1 jupyter jupyter 12898861668 Aug 27 18:15 ag-chain-cosmos.tar.gz

p.s. I think object storage a better fit for .tar.gz files…

jupyter@slog45nb:~$ gsutil -m rsync -r dx-collect/ gs://slogfile-upload-5/dx-collect/

WARNING: gsutil rsync uses hashes when modification time is not available at
both the source and destination. Your crcmod installation isn't using the
module's C extension, so checksumming will run very slowly. If this is your
first rsync since updating gsutil, this rsync can take significantly longer than
usual. For help installing the extension, please see "gsutil help crcmod".

Building synchronization state...
Starting synchronization...
Copying file://dx-collect/33-panic/Syd-ai/ag-chain-cosmos-SYD.zip [Content-Type=application/zip]...
==> NOTE: You are uploading one or more large file(s), which would run          
significantly faster if you enable parallel composite uploads. This
feature can be enabled by editing the
"parallel_composite_upload_threshold" value in your .boto
configuration file. However, note that if you do this large files will
be uploaded as `composite objects
<https://cloud.google.com/storage/docs/composite-objects>`_,which
means that any user who downloads such objects will need to have a
compiled crcmod installed (see "gsutil help crcmod"). This is because
without a compiled crcmod, computing checksums on composite objects is
so slow that gsutil disables downloads of composite objects.

Copying file://dx-collect/33-panic/humantraffic/ag-chain-cosmos.tar.gz [Content-Type=application/x-tar]...
| [2/2 files][ 25.0 GiB/ 25.0 GiB] 100% Done  81.8 MiB/s ETA 00:00:00           
Operation completed over 2 objects/25.0 GiB. 

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn’t ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

Here you go

https://drive.google.com/file/d/1QoiLuAvlh9x5prb01KJ6Lk3ARNvJ7lRF/view?usp=sharing

Happy investigation 🙏

Would a few of you please share your whole .ag-chain-cosmos directory?

It would probably save us time in reproducing the problem.

Sorry I didn’t ask for it in the first place.

@humantraffic , @krisboit, @alipostaci2001 @sshamanov @Syd-ai

yeah, np. https://drive.google.com/file/d/1n_EnE9Juhxq30MLIKpwNd3MENw6uM6CE/view?usp=sharing

Same issue

Aug 26 15:55:26 agoric ag-chain-cosmos[217104]: cannot read snapshot /home/agoric/.ag-chain-cosmos/data/ag-cosmos-chain-state/xs-snapshots/80b929bd4566ec950ee6db3dbb77bf8a6e8cf950285b4bb74928f6e92599b0a7-load-u0gfub.xss: Bad address Aug 26 15:55:26 agoric ag-chain-cosmos[217041]: 2021-08-26T12:55:26.978Z SwingSet: kernel: ##### KERNEL PANIC: unable to re-create vat v10 #####

https://disk.yandex.ru/d/Y34U2pWR9F3IOg https://disk.yandex.ru/d/fg6799Jdt2JsOA

image

i have this error! here is my xs-snapshots file link: https://disk.yandex.com.tr/d/l70acR2IuO2ENw