origin: Origin 3.6 internal registry resolution failure

We have installed a Origin 3.6 instance on our development environment and we pushed some images into the internal registry with success. If we try to start a new deployment with this images the POD fails its pull action from the registry because it can’t resolve the address docker-registry.default.svc.

Failed to pull image "docker-registry.default.svc:5000/test-shared/haproxy@sha256:424a91dde92e2db9b8b9135bcb06e6b1c53645ee7c0ce274287c570e15f1a4b3": rpc error: code = 2 desc = Get https://docker-registry.default.svc:5000/v2/: dial tcp: lookup docker-registry.default.svc on 10.224.20.20:53: no such host

Version
oc v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://dev-openshift.test.it:8443
openshift v3.6.0+c4dd4cf
kubernetes v1.6.1+5115d708d7
Steps To Reproduce
  1. from external servere oc login and docker push of the image on the internal registry
  2. from web-ui adding a new item to the project and run a new image with one replica
Current Result

The pod fails the creation with the error reported below.

Expected Result

The pod pulls the image from the registry.

Additional Information

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 44 (19 by maintainers)

Most upvoted comments

In my case (OCP 3.9) I had to add this to ‘/etc/dnsmasq.d/node-dnsmasq.conf’ and restart dnsmasq.

server=/default.svc/172.30.0.1

Hey, sorry for the late reply.

We were able to get things working on 3.11 using this.

[OSEv3:children]
masters
nodes
etcd

# Set variables common for all OSEv3 hosts
[OSEv3:vars]
# SSH user, this user should allow ssh based auth without requiring a password
ansible_ssh_user=root

openshift_deployment_type=openshift-enterprise
oreg_auth_user="{{ os_registry_user }}"
oreg_auth_password="{{ os_registry_pass }}"

openshift_master_default_subdomain=apps.openshift.subdomain
openshift_hosted_registry_routehost=registry.apps.openshift.subdomain

# uncomment the following to enable htpasswd authentication; defaults to DenyAllPasswordIdentityProvider
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]

# host group for masters
[masters]
node01.subdomain openshift_public_hostname=master.openshift.subdomain

# host group for etcd
[etcd]
node01.subdomain openshift_public_hostname=master.openshift.subdomain

# host group for nodes, includes region info
[nodes]
node01.subdomain openshift_public_hostname=node01.openshift.subdomain openshift_node_group_name='node-config-master-infra'
node02.subdomain openshift_public_hostname=node02.openshift.subdomain openshift_node_group_name='node-config-compute'

We had a couple issues which were causing different problems. There are still a few things we are working out as well. Thanks for the help.

I encountered a similar problem with 3.11. In testing:

dig +showsearch docker-registry.default.svc resolved to 92.242.140.21 dig +showsearch docker-registry.default.svc.cluster.local resolved to 172.30.89.2

As a less than ideal workaround, adding ‘172.30.89.2 docker-registry.default.svc’ to /etc/hosts and restarting the cluster (including docker) worked for me.

Worked for me

I encountered a similar problem with 3.11. In testing:

dig +showsearch docker-registry.default.svc resolved to 92.242.140.21 dig +showsearch docker-registry.default.svc.cluster.local resolved to 172.30.89.2

As a less than ideal workaround, adding ‘172.30.89.2 docker-registry.default.svc’ to /etc/hosts and restarting the cluster (including docker) worked for me.

@markandrewj What’s in your /etc/resolv.conf on the host? It should have added cluster.local to the search path. What does dig +showsearch docker-registry.default.svc show you?

This again happened:

++ git --work-tree /data/src/github.com/openshift/origin describe --long --tags --abbrev=7 --match 'v[0-9]*' '5c49449^{commit}'
+ OS_GIT_VERSION=v3.9.0-alpha.4-367-g5c49449
...
2018-02-16T04:57:32.868379745Z Pushing image docker-registry.default.svc:5000/extended-test-dancer-repo-test-l6shc-lz94v/dancer-example:latest ...
2018-02-16T04:57:33.777538638Z Registry server Address: 
2018-02-16T04:57:33.777569802Z Registry server User Name: serviceaccount
2018-02-16T04:57:33.777574483Z Registry server Email: serviceaccount@example.org
2018-02-16T04:57:33.777578126Z Registry server Password: <<non-empty>>
2018-02-16T04:57:33.797309512Z error: build error: Failed to push image: Get https://docker-registry.default.svc:5000/v1/_ping: dial tcp: lookup docker-registry.default.svc on 127.0.0.1:53: no such host

https://ci.openshift.redhat.com/jenkins/job/test_branch_origin_extended_image_ecosystem/388/

Thanks for the followup, I’m going to say this was addressed by https://github.com/openshift/openshift-ansible/pull/5145 then