teku: Bug: keymanager API not started when engine API not yet up

Description

Teku 23.9.0 does not create validator/key-manager/validator-api-bearer and does not start keymanager API when the Engine API is still starting as it starts.

This can keep users of Eth Docker from using the keymanager API. This appears to be a fairly recent issue.

Steps to Reproduce (Bug)

Run a fresh install of Teku 23.9.0 with --data-path=/var/lib/teku and validator API enabled:

      - --validator-api-enabled=true
      - --validator-api-interface=0.0.0.0
      - --validator-api-port=${KEY_API_PORT:-7500}
      - --validator-api-host-allowlist=*
      - --validator-api-cors-origins=*
      - --validator-api-keystore-file=/var/lib/teku/teku-keyapi.keystore
      - --validator-api-keystore-password-file=/var/lib/teku/teku-keyapi.password

Observe whether /var/lib/teku/validator/key-manager/validator-api-bearer is being created and keymanager API starts.

Frequency:

Reproducible.

Falling back to Teku v23.8.0 resolves the issue.

Versions (Add all that apply)

  • Software version: teku/v23.9.0/linux-x86_64/-eclipseadoptium-openjdk64bitservervm-java-17

Logs

See below for failure and success logs.

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 20 (7 by maintainers)

Commits related to this issue

Most upvoted comments

Awesome let me try the PR and see whether it highlights cause! Since I seem to be able to replicate this.

The interesting thing here for me is that it’s not happening in 23.8, because I’m not sure we’ve really changed anything, but will try to replicate so that i can take a closer look.

Regarding the key manager API startup, I see the log: Validator Api Configuration | Listen Address: 0.0.0.0, Port 7500, Allow: [*] in all the log snippets which means that the API has started correctly.

Does not actually, it has started up successfully when I see

eth-docker-devel-consensus-1  | 2023-09-12 12:29:12.023 INFO  - Started ServerConnector@18b3b383{SSL, (ssl, http/1.1)}{0.0.0.0:7500}
eth-docker-devel-consensus-1  | 2023-09-12 12:29:12.024 INFO  - Started Server@55692aa{STARTING}[11.0.15,sto=0] @28122ms
eth-docker-devel-consensus-1  | 2023-09-12 12:29:12.025 INFO  - Listening on http://localhost:7500/

So what I see in failure case is:

  • Validator Api Configuration | Listen Address: 0.0.0.0, Port 7500, Allow: [*]
  • No Listening on http://localhost:7500/
  • bearer token missing if it didn’t exist
  • If bearer token already existed, in failure case API calls will fail because it never started Listening