rancher: Can't Install Heapster on Kubernetes

Rancher Version: 1.0.1 Docker Version: 1.9.1 Kubectl Version: 1.2.0

I tried to install heapster with influxDB backend and grafana, i followed the official example, with some changes to the heapster controller:

apiVersion: v1
kind: ReplicationController
metadata:
  labels:
    k8s-app: heapster
    name: heapster
    version: v6
  name: heapster
  namespace: kube-system
spec:
  replicas: 1
  selector:
    k8s-app: heapster
    version: v6
  template:
    metadata:
      labels:
        k8s-app: heapster
        version: v6
    spec:
      containers:
      - name: heapster
        image: kubernetes/heapster:canary
        imagePullPolicy: Always
        command:
        - /heapster
        - --source=kubernetes:https://$KUBERNETES_SERVICE_HOST?inClusterConfig=false&insecure=true
        - --sink=influxdb:http://monitoring-influxdb:8086

Having the arguments inClusterConfig is to run heapster without authentication, and insecure=false is to trust the kubernetes certificate.

Results:

The first problem was with Grafana, which had some missing assets when accessing its UI:

4/12/2016 10:24:00 PM2016/04/12 19:24:00 [1;32m[I] [1;31mCompleted /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/app/app.3c38f44f.js 404 Not Found in 1.42773ms[0m[0m
4/12/2016 10:33:12 PM2016/04/12 19:33:12 [1;32m[I] [1;31mCompleted /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/css/grafana.dark.min.4efc02b6.css 404 Not Found in 2.045719ms[0m[0m
4/12/2016 10:33:12 PM2016/04/12 19:33:12 [1;32m[I] [1;31mCompleted /api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/app/app.3c38f44f.js 404 Not Found in 826.323µs[0m[0m

I believe that the urls are being rewritten, and the part

/api/v1/proxy/namespaces/kube-system/services/monitoring-grafana/

is being added to the url.

The second problem was with Heapster itself:

4/13/2016 1:05:08 AMI0412 22:05:08.156515       1 heapster.go:60] /heapster --source=kubernetes:https://$KUBERNETES_SERVICE_HOST?inClusterConfig=false&insecure=true --sink=influxdb:http://monitoring-influxdb:8086
4/13/2016 1:05:08 AMI0412 22:05:08.156635       1 heapster.go:61] Heapster version 1.1.0-beta1
4/13/2016 1:05:08 AMI0412 22:05:08.156672       1 configs.go:60] Using Kubernetes client with master "https://10.43.0.1" and version "v1"
4/13/2016 1:05:08 AMI0412 22:05:08.156684       1 configs.go:61] Using kubelet port 10255
4/13/2016 1:05:08 AMI0412 22:05:08.264233       1 influxdb.go:199] created influxdb sink with options: host:monitoring-influxdb:8086 user:root db:k8s
4/13/2016 1:05:08 AMI0412 22:05:08.264273       1 heapster.go:87] Starting with InfluxDB Sink
4/13/2016 1:05:08 AMI0412 22:05:08.264285       1 heapster.go:87] Starting with Metric Sink
4/13/2016 1:05:08 AMI0412 22:05:08.296204       1 heapster.go:166] Starting heapster on port 8082
4/13/2016 1:05:35 AMI0412 22:05:35.000277       1 manager.go:79] Scraping metrics start: 2016-04-12 22:05:00 +0000 UTC, end: 2016-04-12 22:05:30 +0000 UTC
4/13/2016 1:05:35 AME0412 22:05:35.000484       1 kubelet.go:279] Node node has no valid hostname and/or IP address: node
4/13/2016 1:05:35 AMI0412 22:05:35.000542       1 manager.go:152] ScrapeMetrics: time: 4.946µs size: 0

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 20 (7 by maintainers)

Most upvoted comments

To be more explicit:

The libc that alpine uses by default, musl, has historically had a lot of problems with DNS and continues to have some today. Historically, musl didn’t implement all of the features that k8s expected out of a DNS resolver, so it was pretty much totally broken. That’s fixed now, but I’m not sure if it’s fixed in the version of Alpine that Heapster is built against.

The other problem, which persists today, is the nonstandard resolution strategy that musl employs for looking up domains. Most DNS resolvers go down the list of DNS hosts from resolv.conf in-order, trying each one until a result is found. In order to speed things up, musl tries all the resolvers from the list in parallel and uses the first result. This is cool because it speeds things up, but the problem is that if the resolver that happens to reply first doesn’t have a record for the hostname, musl will fast-fail the lookup. While the parallelism is really cool, this behavior fundamentally breaks a lot of assumptions about how DNS lookup will work. For example, in k8s, the resolv.conf files are strategically defined to check cluster DNS first (in order to be able to resolve services) and then to try outside hosts. So if the external DNS service comes back first and says “Nope, service-name.default isn’t a valid hostname”, the DNS lookup will fail.

Anyway, I know that the Dockerfile for heapster appears to be trying to put glibc into place, but I tried actually running two nslookup commands in containers in my k8s cluster: one within a busybox image and the other within the heapster one. The busybox-based lookup was able to resolve names, but the heapster-based one was not.