opencost: GCE metadata "instance/attributes/cluster-name" not defined

I am having issue deploying kubecost from the helm chart in a private GKE cluster

I have the following logs:

I0322 07:41:17.915375       1 router.go:1319] Starting cost-model (git commit "1.91.0")
I0322 07:41:17.915486       1 router.go:1342] Prometheus/Thanos Client Max Concurrency set to 5
I0322 07:41:17.945162       1 router.go:1383] Success: retrieved the 'up' query against prometheus at: http://copa-metrics-prometheus.metrics.svc.cluster.local:9090
I0322 07:41:17.967272       1 router.go:1391] Retrieved a prometheus config file from: http://copa-metrics-prometheus.metrics.svc.cluster.local:9090
I0322 07:41:18.023159       1 router.go:1400] Using scrape interval of 60.000000
I0322 07:41:18.024295       1 clustercache.go:114] NAMESPACE: kubecost
I0322 07:41:21.513314       1 clustercache.go:161] Done waiting
I0322 07:41:21.514253       1 watchcontroller.go:195] Starting *v1.Namespace controller
I0322 07:41:21.514362       1 watchcontroller.go:195] Starting *v1.Node controller
I0322 07:41:21.514399       1 watchcontroller.go:195] Starting *v1.Pod controller
I0322 07:41:21.514810       1 watchcontroller.go:195] Starting *v1.Service controller
I0322 07:41:21.514964       1 watchcontroller.go:195] Starting *v1.StorageClass controller
I0322 07:41:21.514968       1 watchcontroller.go:195] Starting *v1.PersistentVolumeClaim controller
I0322 07:41:21.515002       1 watchcontroller.go:195] Starting *v1.Job controller
I0322 07:41:21.515102       1 watchcontroller.go:195] Starting *v1.ConfigMap controller
I0322 07:41:21.515138       1 watchcontroller.go:195] Starting *v2beta1.HorizontalPodAutoscaler controller
I0322 07:41:21.515144       1 watchcontroller.go:195] Starting *v1.DaemonSet controller
I0322 07:41:21.515174       1 watchcontroller.go:195] Starting *v1.Deployment controller
I0322 07:41:21.515193       1 watchcontroller.go:195] Starting *v1beta1.PodDisruptionBudget controller
I0322 07:41:21.515206       1 watchcontroller.go:195] Starting *v1.ReplicationController controller
I0322 07:41:21.515291       1 watchcontroller.go:195] Starting *v1.StatefulSet controller
I0322 07:41:21.515329       1 watchcontroller.go:195] Starting *v1.ReplicaSet controller
I0322 07:41:21.515581       1 watchcontroller.go:195] Starting *v1.PersistentVolume controller
I0322 07:41:21.613423       1 provider.go:440] metadata reports we are in GCE
I0322 07:41:21.618760       1 router.go:1452] No app-configs configmap found at install time, using existing configs: configmaps "app-configs" not found
I0322 07:41:21.621698       1 router.go:1452] No product-configs configmap found at install time, using existing configs: configmaps "product-configs" not found
I0322 07:41:21.623920       1 router.go:1452] No alert-configs configmap found at install time, using existing configs: configmaps "alert-configs" not found
I0322 07:41:21.626181       1 router.go:1452] No saved-report-configs configmap found at install time, using existing configs: configmaps "saved-report-configs" not found
I0322 07:41:21.628533       1 router.go:1452] No asset-report-configs configmap found at install time, using existing configs: configmaps "asset-report-configs" not found
I0322 07:41:21.630486       1 router.go:1452] No group-filters configmap found at install time, using existing configs: configmaps "group-filters" not found
I0322 07:41:21.633090       1 router.go:1454] Found configmap pricing-configs, watching...
I0322 07:41:21.647877       1 router.go:1452] No metrics-config configmap found at install time, using existing configs: configmaps "metrics-config" not found
I0322 07:41:22.205672       1 router.go:1500] Success: retrieved the 'up' query against Thanos at: http://thanos-app-query-frontend.monitoring.svc.cluster.local:9090
I0322 07:41:22.206060       1 gcpprovider.go:190] GCP Auth Key already exists, no need to load from secret
I0322 07:41:22.336843       1 gcpprovider.go:303] Error loading metadata cluster-name: metadata: GCE metadata "instance/attributes/cluster-name" not defined
I0322 07:41:22.737429       1 gcpprovider.go:878] Found 0 reserved instances
I0322 07:41:22.737460       1 gcpprovider.go:756] Fetch GCP Billing Data from URL: https://cloudbilling.googleapis.com/v1/services/6F81-5844-456A/skus?key=AIzaSyDXQPG_MHUEy9neR7stolq6l0ujXmjJlvk&currencyCode=USD
I0322 07:51:22.513334       1 gcpprovider.go:303] Error loading metadata cluster-name: metadata: GCE metadata "instance/attributes/cluster-name" not defined

Although I have restrictive network policy in place, I am pretty sure that this is NOT the root cause. I have followed the docs here: https://guide.kubecost.com/hc/en-us/articles/4407601830679-Troubleshoot-Install#metadata

Also, I have issued a curl command from the cost-analyzer-frontend conttainer, running on the same pod as the cost-model container, and it is working:

└─$ kubectl -n kubecost exec kubecost-app-costanalyzer-6656d79f5c-4dnp9 -c cost-analyzer-frontend -- curl -H 'Metadata-Flavor: Google' http://metadata.google.internal/computeMetadata/v1/instance/attributes/cluster-name
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    37  100    37    0     0   2493      0 --:--:-- --:--:-- --:--:--  2642non-prod-management-plane-gke-cluster                                                                                                                      

Any clue ?

gz#1544

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, sorry for the late feedback. Root cause was that we have VPC Service controls enabled and use restricted google access that only allow Google services supporting VPC SC. Full explanation here: https://cloud.google.com/vpc-service-controls/docs/private-connectivity#:~:text=Private Google Access offers private,control access to protected resources.

I worked with our network team and found a workaround, which basically is having a DNS policy to use private google access for cloudbilling API. That fixes the issue and I am able to deploy the cost-analyzer now. Thanks to the whole team for your support.

Hey folks, that infinite loop error makes sense to fix and seems pretty straightforward-- @nealormsbee can you pull his notes into here: https://github.com/kubecost/cost-model/issues/1125#issuecomment-1100968126 into a PR?