indexer: Catchpoint Stuck on Phase 1
Subject of the issue
I’m running an indexer mainnet instance and I recently upgraded from 2.11 => 2.13. I went through the procedures for updating to the last catchpoint but it seems to have gotten stuck after processing the accounts. Here are logs:
Jul 29 21:32:04 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:04Z"}
Jul 29 21:32:09 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:09Z"}
Jul 29 21:32:14 ip-10-0-1-62 algorand-indexer[10434]: {"event":"ConnectedOut","file":"wsNetwork.go","function":"github.com/algorand/go-algorand/network.(*WebsocketNetwork).tryConnect","level":"info","line":2094,"local":"","msg":"Made outgoing connection to peer relay-mumbai-mai
Jul 29 21:32:14 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:14Z"}
Jul 29 21:32:19 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:19Z"}
Jul 29 21:32:24 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:24Z"}
Jul 29 21:32:29 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:29Z"}
Jul 29 21:32:34 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:34Z"}
Jul 29 21:32:39 ip-10-0-1-62 algorand-indexer[10434]: {"level":"info","msg":"catchup phase 1 of 4 (Processed Accounts): 14233616 / 14233616","time":"2022-07-29T21:32:39Z"}
It’s all normal except the WebSocket connection issue. One thing to note is that my indexer instance is in a VPC on AWS. The firewall rules allow for only incoming SSH connections, but no outgoing connections. It’s been an hour at this point with the indexer stuck in this state.
Your environment
12885426177 3.8.1.stable [rel/stable] (commit #73615e0b) go-algorand is licensed with AGPLv3.0 source code available at https://github.com/algorand/go-algorand admin@ip-10-0-1-62:/mainnet$
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19 (7 by maintainers)
Had low IOPS increased and got it running. Other issues besides. Will open a new issue.
@Blackglade thanks for bearing with me. I ran some tests and closer inspection of your error messages, I have a few things to share:
Unusual messages
Processed accounts resetting
I was able to reproduce this using an EBS drive once it started throttling the IOPS.
Followup question / recommendation
What type of EBS drive are you using, and how many IOPS has it been provisioned with?
New recommendation: deployment should use
NVMe
drives.This is based on testing with
standard (magnetic)
/gp2
/io1
/io2
/NVMe
and matches the recommendation for algod. It should have been the recommendation for the new version of Indexer from the beginning, so I’m really sorry to have put you through this. Thanks again for the detailed logs that made it very clear that there was a problem.Furthermore, for this testing I also put together a utility to make it easier to test this process. I hope to make it available in a future release to assist with debugging hardware configurations in the future.
@Blackglade thanks for the logs. I don’t recall seeing anything like this during testing, so I’ll need ask around next week.