go: Horizon timeout after transaction is added to ledger

This is reproducible on 4 identical machines on DigitalOcean. The instances are on DO’s zones FRA1, SFO1, SGP1, NYC1

Each node is running Horizon + Core (watcher, non-validating) using stellar/quickstart docker image with this commit: https://github.com/stellar/docker-stellar-core-horizon/commit/aad080792989a8082cbffc0f58bd8e481957af86

A local application submits transactions to the local Horizon on each node.

Here is the following Core and Horizon configuration in use:

# stellar/quickstart specific
HTTP_PORT=11626
PUBLIC_HTTP_PORT=true
LOG_FILE_PATH=""
DATABASE="postgresql://dbname=core host=localhost user=stellar password=1"
CATCHUP_RECENT=1024

NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015"

NODE_NAMES=[
"GC5SXLNAM3C4NMGK2PXK4R34B5GNZ47FYQ24ZIBFDFOCU6D4KBN4POAE  satoshipay1",
"GBJQUIXUO4XSNPAUT6ODLZUJRV2NPXYASKUBY4G5MYP3M47PCVI55MNT  satoshipay2",
"GAK6Z5UVGUVSEK6PEOCAYJISTT5EJBB34PN3NOLEQG2SUKXRVV2F6HZY  satoshipay3",
"GCGWABAQ6OUOVUGWJVPRJ5LWBIWYN3CVOVOZYBNQQGIBRULQHYNGQ7GH  cryptomover1",
"GC7MH45NSXXPBLQJRSEVF2DFUVLGGYOJER5FRUNVCYVMXJYJT5LLQJW5  cryptomover2",
"GAENPO2XRTTMAJXDWM3E3GAALNLG4HVMKJ4QF525TR25RI42YPEDULOW  ibm_uk",
"GBJ7T3BTLX2BP3T5Q4256PUF7JMDAB35LLO32QRDYE67TDDMN7H33GGE  ibm_hong_kong",
"GCGB2S2KGYARPVIA37HYZXVRM2YZUEXA6S33ZU5BUDC6THSB62LZSTYH  sdf1",
"GCM6QMP3DLRPTAZW2UZPCPX2LF3SXWXKPMP3GKFZBDSF3QZGV2G5QSTK  sdf2",
"GABMKJM6I25XI4K7U6XWMULOUQIQ27BCTMLS6BYYSOWKTBUXVRJSXHYQ  sdf3",
]

KNOWN_PEERS=[
"core-live-a.stellar.org:11625",
"core-live-b.stellar.org:11625",
"core-live-c.stellar.org:11625",
"confucius.strllar.org",
#"stellar1.bitventure.co",
"stellar.256kw.com"
]

[QUORUM_SET]
#VALIDATORS=[
#"$eno", "$tempo.eu.com", "$satoshipay", "$cryptomover", "$umbrel", "$exodo", "$ibm", "$sdf_watcher1", "$sdf_watcher2", "$sdf_watcher3"
#]
VALIDATORS=[
"$sdf1", "$sdf2", "$sdf3", "$satoshipay1", "$satoshipay2", "$satoshipay3", "$cryptomover1", "$cryptomover2", "$ibm_uk", "$ibm_hong_kong"
]


[HISTORY.cache]
get="cp /opt/stellar/history-cache/{0} {1}"

[HISTORY.sdf1]
get="curl -sf http://history.stellar.org/prd/core-live/core_live_001/{0} -o {1}"

[HISTORY.sdf2]
get="curl -sf http://history.stellar.org/prd/core-live/core_live_002/{0} -o {1}"

[HISTORY.sdf3]
get="curl -sf http://history.stellar.org/prd/core-live/core_live_003/{0} -o {1}"
#!/bin/bash

export DATABASE_URL="postgres://stellar:1@localhost/horizon"
export STELLAR_CORE_DATABASE_URL="postgres://stellar:1@localhost/core"
export STELLAR_CORE_URL="http://localhost:11626"
export LOG_LEVEL="info"
export INGEST="true"
export PER_HOUR_RATE_LIMIT="72000"
export NETWORK_PASSPHRASE="Public Global Stellar Network ; September 2015"

It looks like the configuration is OK, no idea why I’m getting the timeouts.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 5
  • Comments: 53 (26 by maintainers)

Most upvoted comments

Quick update: I’ve just released Horizon 0.14.2 that fixes many issues in transaction submission, potentially also the one discussed here. Additionally I improved logging so please make sure you save your logs and if you experience this again (after the upgrade to 0.14.2) send them to me.

Here’s the chart with 504 errors in horizon-testnet.stellar.org:

screen shot 2018-09-27 at 14 20 01

After deploying the code the number of 504s dropped to 0 (well almost, there was one transaction around 9am that was finally included in the ledger - after the request was over, so resubmitting it would give you it’s final status).

OK, it’s possible that it’s some obscure sync issue (like https://github.com/stellar/go/pull/603). Will try to debug it again.

Yes @s-a-y this is not caused on high load as the transaction which I am submitting to horizon server is much low than 50 TPS and moreover the transaction gets included to ledger but still the horizon gets timeout.

@bartekn can you check this and need to reopen this issue.

@gituser The general best practice is as follows:

  1. Set a timebound for your transaction. This guarantees that after that time the transaction won’t be accepted by the network. This protects you from e.g. sending a second duplicate payment.
  2. Set an appropriate fee. If you like, you can get detailed information on the current necessary fees by querying the fee_stats endpoint, however…
  3. …Set the highest maximum fee you are comfortable with. This doesn’t mean you pay that in full on every transaction! You will only pay whatever is necessary to get you into the ledger. So under normal circumstances, even with a higher max fee set, you will pay the standard fee (currently 100 stroops).
  4. Implement a retry loop with increasing delay (e.g. 30s, 60s, 90s). This should only execute once you’ve exceeded the timebound you set.
  5. The timeout error happens because Horizon cannot know immediately if core will include the transaction in a future ledger. Detailed information about this can be found here: https://developers.stellar.org/api/errors/http-status-codes/horizon-specific/timeout/

We’re looking at improving the docs to make all of this clearer. I hope this helps.

@gituser Don’t know, may be a coincidence, I’ve followed the suggestion of this guy

@andrenarchy Hm, I just set LOG_LEVEL=debug and after restarting Horizon the 504s disappeared. We keep getting this issue frequently – any idea what we can do next time it happens?

@gituser Try to change in horizon.env this row export LOG_LEVEL=“info” to export LOG_LEVEL=“debug”

It worked for me.