charts: [xray] Cluster join failure

Is this a request for help?: yes

Is this a BUG REPORT or FEATURE REQUEST? (choose one): Bug

Version of Helm and Kubernetes: Helm: 3.2.4 Kubernetes: 1.16.7

Which chart: Xray - Chart version: 4.1.3 - appVersion: 3.6.2

What happened: Xray chart deploys, HA rabbit MQ starts up without issue however logs from xray pod shows that the router and xray-server container fail with the Cluster join error.

router:

Cluster join: Retry 5: Service registry ping failed, will retry. Error: Get "<jfrogURL>/access/api/v1/system/ping": unsupported protocol scheme ""

xray-server:

Cluster join: Retry 15: Service registry ping failed, will retry. Error: Error while trying to connect to local router at address 'http://localhost:8046/access': Get http://localhost:8046/access/api/v1/system/ping: dial tcp 127.0.0.1:8046: connect: connection refused

Wondered if I had set the jfrogUrl param incorrectly so tried with and without https/http protocol but no difference in result.

What you expected to happen: Containers to come up healthily and be able to access xray service in jfrog platform UI

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know: Connecting to an external Postgres DB (configured correctly afaik) Connecting to Artifactory 7.6.3 HA cluster (seems compatible according to Docs)

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15

Most upvoted comments

We are also facing the same issue while deploying Mission Control as a helm chart, how did you fix it?

NAME                           READY   STATUS             RESTARTS   AGE
mission-control-0              3/5     CrashLoopBackOff   18         25m
mission-control-postgresql-0   1/1     Running            0          2d2h

kubectl logs -f   mission-control-0  -n mission-control -c router
Testing directory /var/opt/jfrog/router has read/write permissions for user id 1050
Permissions for /var/opt/jfrog/router are good
Setting JF_SHARED_NODE_ID to mission-control-0
Setting JF_SHARED_NODE_IP to 172.40.39.4
Setting JF_SHARED_NODE_NAME to mission-control-0
Resolved shared.logging.consoleLog.enabled (false) from /opt/jfrog/router/var/etc/system.yaml
Redirection is set to false. Skipping log redirection
2020-10-29T11:34:02.062Z [jfrou] [INFO ] [554b36001d43c192] [bootstrap.go:72               ] [main                ] - Router (jfrou) service initialization started. Version: 1.4.4 Revision: a99973a659a991f8936f93db2eb3a0f38e5cedda PID: 209 Home: /opt/jfrog/router
2020-10-29T11:34:02.062Z [jfrou] [INFO ] [554b36001d43c192] [bootstrap.go:75               ] [main                ] - JFrog Router IP: 172.40.39.4
2020-10-29T11:34:02.065Z [jfrou] [INFO ] [554b36001d43c192] [bootstrap.go:175              ] [main                ] - System configuration encryption report:
shared.elasticsearch.password: already encrypted
shared.newrelic.licenseKey: does not exist in the config file
shared.security.joinKeyFile: file '/opt/jfrog/router/var/etc/security/join.key' - already encrypted
2020-10-29T11:34:02.065Z [jfrou] [INFO ] [554b36001d43c192] [bootstrap.go:80               ] [main                ] - JFrog Router Service ID: jfrou@01enmn65kr5qdhcnbg5a87rg8e
2020-10-29T11:34:02.065Z [jfrou] [INFO ] [554b36001d43c192] [bootstrap.go:81               ] [main                ] - JFrog Router Node ID: mission-control-0
2020-10-29T11:34:02.102Z [jfrou] [INFO ] [554b36001d43c192] [http_client_holder.go:155     ] [main                ] - System cert pool contents were loaded as trusted CAs for TLS communication
2020-10-29T11:34:03.335Z [jfrou] [INFO ] [554b36001d43c192] [join_executor.go:118          ] [main                ] - Cluster join: Trying to rejoin the cluster
2020-10-29T11:34:06.857Z [jfrou] [FATAL] [554b36001d43c192] [bootstrap.go:101              ] [main                ] - Could not join access, err: Cluster join: Failed joining the cluster; Error: Error response from service registry, status code: 400; message: Could not validate router Check-url: http://172.40.39.4:8082/router/api/v1/system/ping; detail: I/O error on GET request for "http://172.40.39.4:8082/router/api/v1/system/ping": Connect to 172.40.39.4:8082 [/172.40.39.4] failed: connect timed out; nested exception is org.apache.http.conn.ConnectTimeoutException: Connect to 172.40.39.4:8082 [/172.40.39.4] failed: connect timed out```

@chukka @nkaplatt @ramesh4karma

govindkailas on Oct 29, 2020

We solved this issue in the end, turns out we had directional traffic from our Xray instance to Artifactory but traffic coming back was getting blocked. Reason it took us so long to figure out is that we interpreted the error to be a error on the Xray side (thinking the pod was trying to call itself) when in fact the error was from the Artifactory side saying that it couldn’t reach Xray. Error isn’t explicit enough and the documentation wasn’t much help either…

nkaplatt on Aug 25, 2020

@nkaplatt Can you share the reproducible steps (helm commands) along with values.yaml file

chukka on Aug 10, 2020