argo-cd: Unable to deploy ArgoCD with HA

Checklist:

  • I’ve searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I’ve included steps to reproduce the bug.
  • I’ve pasted the output of argocd version.

Describe the bug

ArgoCD is unable to deploy correctly with HA. This happens on the namespace of argocd-installation

To Reproduce

Upgrade from 2.4.6 to 2.5.1 or 2.5.2

Expected behavior

ArgoCD is upgraded/deployed successfully

Version

2.5.2 and 2.5.1 (same issue on both versions)

Logs

ha proxy:

[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:9] for proxy health_check_http_url: cannot create receiving socket (Address family not supported by protocol) for [:::8888]
[ALERT]    (1) : Binding [/usr/local/etc/haproxy/haproxy.cfg:56] for frontend ft_redis_master: cannot create receiving socket (Address family not supported by protocol) for [:::6379]
[ALERT]    (1) : [haproxy.main()] Some protocols failed to start their listeners! Exiting.

redis ha:

21 Nov 2022 16:22:36.369 # Configuration loaded
21 Nov 2022 16:22:36.370 * monotonic clock: POSIX clock_gettime
21 Nov 2022 16:22:36.377 # Warning: Could not create server TCP listening socket ::*:6379: unable to bind socket, errno: 97
21 Nov 2022 16:22:36.378 * Running mode=standalone, port=6379.
21 Nov 2022 16:22:36.378 # Server initialized
21 Nov 2022 16:22:36.379 * Ready to accept connections

repository server:

time="2022-11-21T16:25:46Z" level=info msg="ArgoCD Repository Server is starting" built="2022-11-07T16:42:47Z" commit=148d8da7a996f6c9f4d102fdd8e688c2ff3fd8c7 port=8081 version=v2.5.2+148d8da
time="2022-11-21T16:25:46Z" level=info msg="Generating self-signed TLS certificate for this session"
time="2022-11-21T16:25:46Z" level=info msg="Initializing GnuPG keyring at /app/config/gpg/keys"
time="2022-11-21T16:25:46Z" level=info msg="gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569" dir= execID=9e8d3
time="2022-11-21T16:25:52Z" level=error msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569` failed exit status 2" execID=9e8d3
time="2022-11-21T16:25:52Z" level=info msg=Trace args="[gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569]" dir= operation_name="exec gpg" time_ms=6031.865355
time="2022-11-21T16:25:52Z" level=fatal msg="`gpg --no-permission-warning --logger-fd 1 --batch --gen-key /tmp/gpg-key-recipe238040569` failed exit status 2"

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 2
  • Comments: 31 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work.

I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life.

Stopping by to add where my issue with this symptom came from.

It had to do with the Kubernetes networking setup and the assumption with the HA redis setup of IPv4 networking. My cluster was configured in dual stack mode for IPv4 and IPv6. The IPv6 address range was the first in cluster specification, so it is the IP listed in places that don’t show all IPs. Effectively if a Service definition does specify the IP family, it will be single family and IPv6. This is a problem for the HA setup because it defaults to all IPv4 bind addresses in the templated configuration files. Switching them all to IPv6, e.g. bind :: for redis and bind [::]:8888, bind [::]:6379 in HAproxy resolved the issue.

I suspect also changing the ipFamily in the service definitions to IPv4 would also work.

Seems that network policies argocd-redis-ha-proxy-network-policy and argocd-redis-ha-server-network-policy has to be reviewed. After deleting both policies everything started to work.

I have checked no other network policy has defined ports for DNS and only the above two have port 53 defined which is incorrect (for Openshift). Changed UPD/TCP ports to 5353 and everything came back to life.

Nice find @rimasgo! I verified this works for our deployment as well via kustomize changes against v2.6.2.

- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-proxy-network-policy

- patch: |-
    - op: replace
      path: /spec/egress/1/ports/0/port
      value: 5353
    - op: replace
      path: /spec/egress/1/ports/1/port
      value: 5353
  target:
    kind: NetworkPolicy
    name: argocd-redis-ha-server-network-policy