solana: The solana-web3.js transaction confirmation logic is very broken

The correct approach to confirming a normal transaction is to fetch the last valid block height for the transaction blockhash, then wait until the chain has advanced beyond that block height before declaring the transaction expired.

A transaction that uses a nonce is a little different, since those transactions are technically valid until the nonce is advanced. For nonce-based transactions, the right approach is to check the corresponding nonce account to see if the nonce has been advanced while periodically retrying the transaction (although some form of timeout or mechanism for the caller to abort seems nice)

The big offenders:

1. confirmTransaction

This method simply waits for up to 60 seconds. This is wrong because a transaction’s blockhash may be valid for longer than 60 seconds, so it’s very possible for the transaction to execute after this method returns a failure. The signature of confirmTransaction does not include the blockhash of the transaction so it’s literally impossible to do the right thing without a signature change. This method probably just should be deprecated.

2. sendAndConfirmTransaction and sendAndConfirmRawTransaction

These two just need to be reimplemented to not use confirmTransaction and instead follow the approach outlined at the start of this issue.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 13
  • Comments: 48 (47 by maintainers)

Most upvoted comments

Endgame

The endgame would look like this:

// Recent blockhashes.
const latestBlockhash = await connection.getLatestBlockhash();
const transaction = new Transaction({latestBlockhash}).add(/* ... */);
const signature = connection.sendTransaction(transaction, /* ... */);
await confirmTransaction({
  latestBlockhash,
  signature,
});

// Nonces
const nonceInfo = /* ... */;
const transaction = new Transaction({nonceInfo}).add(/* ... */);
const signature = connection.sendTransaction(transaction, /* ... */);
await confirmTransaction({
  nonceInfo,
  signature,
});

cc/ @jordansexton, @joncinque, @jstarry for a dummy check on these incremental steps.

It would be really important to expand the documentation to show good examples how to handle the different outcomes confirmTransaction can have. Important to consider:

  • http error codes / connection timeouts / parsing errors - loose grouped as “bad endpoint”
  • blockhash expiry - aka “bad tps”
  • confirmation (including commitment level & slot) “success”

A couple other things to consider:

  • When/where do transaction retries occur. We’re trying to get away from RPC-level retries, so this burden will fall on the client.
  • The client should retain their keypair, so signing ideally happens before handing off to sendTransaction/etc for transmission and retries

On the topic of breaking changes

Rather than to patch in lastValidBlockhash here and there, I think it’s reasonable to make the breaking changes that involve adding it to the Transaction and Message constructors but support the old signature for a while. There’s a way to use TypeScript to deprecate the old arguments gracefully, while signalling loudly to devs that the removal is on the way!

  1. Write the code defensively to presume that it’s either going to receive the new-style or the old-style constructor options.
  2. Write a TypeScript definition that permits both values, but shows you that one is deprecated in your IDE. Check it out: (playground link)
  3. Warn, on the console, when people use the old one.
  4. Eventually remove the old one in a future version.
image

On the topic of block height / nonce tracking

I think we’ve come up with some good strategies for tracking the current block height and nonce status, but all of them require a lot of logic on the client and a lot of chatter over the wire.

Crazy idea here: @t-nelson, @mvines, what would you think about adding an argument to the signatureSubscribe API to indicate the block height or nonce upon which the transaction with that signature depends. Essentially add a dimension to SubscriptionsTracker to express ‘these subscribers are interested to know when the block height exceeds X’ and ‘these people are interested to know when these nonces are advanced.’ The subscription would resolve with err when the block height is exceeded or the nonce is advanced.

  1. ✅ The only client change we would need would be to add that arg to the existing signature subscription, here.
  2. ✅ No chatter over the wire; no timers on the client. The client just patiently waits for the subscription to resolve.
  3. ✅ Pushes all of the logic down into the RPC, so that any change to that logic only needs to be made in one place to be deployed to the whole world.

@mvines you also mentioned nonce-based transactions in the issue, but neither of us has any experience with nonce accounts. Would getNonceAndContext give us enough information to determine if the nonce has been advanced?

You can check for reference how the Rust client sends and confirms transactions.

The logic is as follows in psuedocode:

If nonce based transaction {
  use latest processed blockhash
} else {
  use blockhash stated in transaction
}
while isBlockhashValid {
  getSignatureStatus
  If signatureStatus == commitment level {
    return signature
  }
}

Using isBlockhashValid and getSignatureStatus

There is some logic about retries in there that you can add as well.

No, polling getBlockHeight won’t fail if the RPC endpoint backend you hit is behind

Shouldn’t this fail the same way if you hit a bad backend?

There’s also isBlockhashValid

We saw this one too and wondered if it would work in this case.

Not finalized, just included in a block (processed commitment) that’s eventually finalized. Once the blockhash is expired, the transaction can never be included in a later block, the block it’s included in can continue to gain stronger commitment though

Thanks for clarifying this. I’m still trying to better understand what commitment means, so this helps. Still a lot to learn!

it seems like we’re going to need to use getLatestBlockhash

There’s also isBlockhashValid

We’re assuming that lastValidBlockHeight is essentially the cut-off for how long a transaction has to be finalized before it expires (based on the wording in the docs).

Not finalized, just included in a block (processed commitment) that’s eventually finalized. Once the blockhash is expired, the transaction can never be included in a later block, the block it’s included in can continue to gain stronger commitment though

Based on all this, we’re thinking we need to use the methods mentioned above to track blocks rather than using a 60-second timeout.

Precisely. Human time is the enemy

Just note that an abort doesn’t mean that the transaction won’t execute after the front-end aborts it

On the topic of cancellation

Oh, I forgot to address one note above. @mvines mentioned that if confirmTransaction() is going to live even longer than 60s, then there should be some way for an app to abort it.

The way is to make confirmTransaction() accept and make use of an AbortController. Check it out! https://developer.mozilla.org/en-US/docs/Web/API/AbortController

Let’s wait just a bit to hear from the RPC/validator folks. The best code is the code you didn’t have to write 😃

Please note that the Rust client logic is also broken in a similar way, in that it uses isBlockhashValid instead of waiting for the last valid block height

isBlockhashValid is weaker than determining the lastValidBlockHeight and just waiting for that block height. isBlockhashValid may return false if you’re talking to a load balanced server and hit a bad backend, even though the blockhash is still valid (= true). I’ve had this happen to me in the wild.