testkube: Failure to resolve host when cloning repository (from private GitHub server)

Describe the bug We have a privately hosted GitHub server for our repositories. When using the Cypress executor we are consistently running into an issue where the domain is failing to resolve when attempting to checkout a repository with Cypress tests. This can happen for 20 or 30+ attempts in a row before having a successful checkout.

Initializing...
Fetching test content from git...
could not fetch test content: failed to fetch git: process error: exit status 128
output: Cloning into 'repo'...
fatal: unable to access 'https://<PRIVATE_GITHUB_SERVER>.com/michael-walsh/MyTestsExample.git/': Could not resolve host: <PRIVATE_GITHUB_SERVER>
 
Failed to fetch git: process error: exit status 128
output: Cloning into 'repo'...
fatal: unable to access 'https://<PRIVATE_GITHUB_SERVER>.com/michael-walsh/MyTestsExample.git/': Could not resolve host: <PRIVATE_GITHUB_SERVER>
 
Could not fetch test content: failed to fetch git: process error: exit status 128
output: Cloning into 'repo'...
fatal: unable to access 'https://<PRIVATE_GITHUB_SERVER>.com/michael-walsh/MyTestsExample.git/': Could not resolve host: <PRIVATE_GITHUB_SERVER>

I am able to issue dig and nslookup commands to this GitHub server without any issues from other pods, it only seems to be the TestKube pods which have issues with DNS resolution.

Service Mesh: Istio DNS Server: CoreDNS

To Reproduce Steps to reproduce the behavior:

Checkout a code repository from a private GitHub server via Cypress executor.

Expected behavior The code should checkout from the Git repository every single time instead failing 20 or 30+ times before a successful one.

Version / Cluster

Which testkube version? - testkube-1.10.400
What Kubernetes cluster? (e.g. GKE, EKS, Openshift etc, local KinD, local Minikube) - Kubernetes on AWS instances (not managed EKS)
What Kubernetes version? - Client version: v1.22.16, Server version: v1.21.9

Screenshots N/A

Additional context To provide some additional context we are using Istio for our Service Mesh in this cluster and also CoreDNS for the DNS server.

About this issue

Original URL
State: closed
Created a year ago
Comments: 16 (9 by maintainers)

Most upvoted comments

Will close this now if that’s alright as #3749 has now been merged.

walsm232 on May 3, 2023

❌ Failed to fetch git: could not start process with command: git, exited with code:128  error: exit status 128
output: Cloning into 'repo'...
fatal: unable to access 'https://<PRIVATE_SERVER>.com/michael-walsh/MyTestsExample.git/': Could not resolve host: <PRIVATE_SERVER>.com

Ah, just started to hit the issue again. I ran it a few more times, see the order of success vs. failures below: ✅ ✅ ✅ ❌ ❌ ✅ ❌ ❌ ❌ ❌ ❌ ❌ ❌ ✅ ❌ Still seeing a large amount of failures but it is significantly better than before as we were seeing far more failures prior to the change.

I see this thread here discussing that DNS resolution issues seem to be pretty common with Alpine images: https://stackoverflow.com/questions/65181012/does-alpine-have-known-dns-issue-within-kubernetes Does Alpine in particular need to be used for this? I can test it out with a Debian / Ubuntu based image and see if it improves

walsm232 on Apr 27, 2023

Have had 3 successful Cypress test runs in a row without any DNS resolution errors since upgrading to the new release. Will try it a few more times and confirm it has been resolved and then we might be good to close this one out.

walsm232 on Apr 27, 2023