origin: Timeout when pulling Docker images taking more than 1 minute to extract
Version
$ oc version oc v1.4.1+3f9807a kubernetes v1.4.0+776c994
OpenShift/Kubernetes fails to pull images whose layers take more than one minute to extract.
$ oc get events -w
Pod Normal Scheduled {default-scheduler } Successfully assigned gitlab-ee-1-3jso0 to oonodedev-001
Pod spec.containers{gitlab-ee} Normal Pulling {kubelet oonodedev-001} pulling image "gitlab/gitlab-ee@sha256:fa58a6765b5431f716ba82f5002a81041224e7430ef2c29b7fdea993a4a96aff"
Pod Warning FailedSync {kubelet oonodedev-001} Error syncing pod, skipping: failed to "StartContainer" for "gitlab-ee" with ErrImagePull: "net/http: request canceled"
Pod spec.containers{gitlab-ee} Warning Failed {kubelet oonodedev-001} Failed to pull image "gitlab/gitlab-ee@sha256:fa58a6765b5431f716ba82f5002a81041224e7430ef2c29b7fdea993a4a96aff": net/http: request canceled
and in the Origin logs:
Feb 24 15:21:45 oonodedev-001 origin-node[20126] kube_docker_client.go:313] Cancel pulling image "gitlab/gitlab-ee@sha256:fa58a6765b5431f716ba82f5002a81041224e7430ef2c29b7fdea993a4a96aff" because of no progress for 1m0s, latest progress: "ac990a380700: Extracting [==================================================>] 288.7 MB/288.7 MB"
The last layer of this particular image (ie gitlab/gitlab-ee:8.16.4-ee.0) takes several minutes to extract and with the default timeout of 1 minute it never goes through. A normal docker pull works.
The one minute value seems to come from the value of defaultImagePullingStuckTimeout (ref. https://github.com/kubernetes/kubernetes/blob/v1.4.0/pkg/kubelet/dockertools/kube_docker_client.go#L81) which is hardcoded and can’t be changed. I’m also seeing this has been changed in Kubernetes 1.6 and the value looks to be customizable.
Could you suggest a possible workaround for the time being? If not, could we increase the default timeout (to something like 10 minutes) and backport it to Origin 1.4 and Origin 1.5?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 9
- Comments: 23 (7 by maintainers)
@alikhajeh1 @bbrfkr @rickbliss @yanhongwang
For Origin 3.6 you can set
image-pull-progress-deadlineto a meaningful value (e.g 10m) in theKubeletArgumentssection of thenode-config.yamlof all your nodes.This is working for us.
Actually, I am happy to close the issue now that this is configurable in Origin 3.6.
@xqianwang Yes. We can set the parameter
image-pull-progress-deadlineinto/etc/origin/node/node-config.yamlas follow;This description works fine in my OpenShift Origin environment.
Changed EC2 nodes from t2.medium to m3.large and fixed the problem
+1 happens to me, 1.5.7 using kops. I am getting with
ErrImagePull: "net/http: request canceled"tries to get the image from AWS ECR. Any ideas guys?