helm: Unable to debug "lookup" function, as its disabled with `helm template`

Output of helm version:

version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}

Output of kubectl version:

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"13+", GitVersion:"v1.13.4+7bd2e5b", GitCommit:"7bd2e5b", GitTreeState:"clean", BuildDate:"2019-05-19T23:52:43Z", GoVersion:"go1.11.5", Compiler:"gc", Platform:"linux/amd64"}

Cloud Provider/Platform (AKS, GKE, Minikube etc.): all

I am trying to use the “lookup” function, which is resulting in problems with my yaml formatting:

$ helm install --namespace myapp api ./charts/deploy/
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Secret.metadata): unknown field "type" in io.k8s.apimachinery.pkg.apis.meta.v1.ObjectMeta

The template file from which I believe the error is coming from looks like this:

---
apiVersion: v1
data:
  auth: {{ (lookup "v1" "Secret" "app2" "api-k8s-htpasswd").data.auth }}
kind: Secret
metadata:
  labels:
    k8s-app: myap-api
  name: htpasswd-secret
type: Opaque

From what I understand, the lookup function was disabled when you run helm template, as trying to process this chart results in a nil pointer error:

Error: template: deploy/templates/htpasswd-secret.yaml:4:56: executing "deploy/templates/htpasswd-secret.yaml" at <"api-k8s-htpasswd">: nil pointer evaluating interface {}.auth

I would like to have a discussion as to why I can’t use helm template to debug my chart, as that, in my opinion, should be the primary purpose of having a template command.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 57
  • Comments: 75 (23 by maintainers)

Commits related to this issue

Most upvoted comments

perhaps a way to make everyone happy is to add a new CLI flag, something like --insecure-display. The effect of the flags would be to let run the lookup function (and other similar capabilities) also in a template or dry-run scenario. In this case the user would be essentially stating that they are accepting the risk that some information may be leaked.

I don’t think we need to add a new flag here. The --validate flag already has documentation that it will reach out to a cluster. This really tripped me up because the lookup function is completely useless without some way to debug it without writing to a cluster.

> helm version
version.BuildInfo{Version:"v3.2.4", GitCommit:"0ad800ef43d3b826f31a5ad8dfbb4fe05d143688", GitTreeState:"dirty", GoVersion:"go1.14.3"}
> helm template --help | grep '\--validate'
      --validate                     validate your manifests against the Kubernetes cluster you are currently pointing at. This is the same validation performed on an install

I think it’s a fair argument to disable lookup in template and lint by default, because template and lint don’t normally reach out to a cluster. A user would have a reasonable expectation that both could be run without needing a kubeconfig or access to any clusters as part of some validation step. What I don’t understand is if a user explicitly wants to, why the lookup functionality has been removed? Similarly (as stated above), I don’t understand why a --dry-run command would be prohibited from read-only access to a cluster. The security disclosure provides this as a justification:

A malicious chart author could inject a lookup into a chart that, when rendered through helm template, performs unannounced lookups against the cluster a user's KUBECONFIG file points to. This information can then be disclosed via the output of helm template.

I fail to understand even a theoretical attack vector for this. If an attacker has access to the helm cli and a valid kubeconfig, they could do much more than lookup data. Again, I could understand nervousness around unannounced and unexpected api calls from a subchart or public chart, but if a user explicitly uses the --validate flag, doesn’t that mean the user now expects an api call to a cluster?

Copying my impression from #8436

Keep in mind that Helm is not supposed to contact the Kubernetes API Server during a helm template or a helm install|update|delete|rollback --dry-run, so the lookup function will return nil in such a case.

Is there a way someone can test the lookup function prior to deploying a chart? If not, is it really necessary to forbid helm from contacting the API server during a dry run? My expectation of a --dry-run in other software is that no state is modified but that I can see exactly what would happen during a full run. This is not the case if your chart uses lookup to fill in template values.

I would expect that during a dry run helm allows GET and HEAD requests to API server resources but not mutating methods like PUT, PATCH, and POST. If there’s a need to have a network-free flow as exists today, perhaps helm template could elide the network calls while helm install|upgrade|delete|rollback --dry-run would perform the required lookups.

Tested and confirmed with the following:

$ helm version
version.BuildInfo{Version:"v3.2.1", GitCommit:"fe51cd1e31e6a202cba7dead9552a6d418ded79a", GitTreeState:"clean", GoVersion:"go1.13.10"}
$ kubectl run nginx --image nginx
$ helm create test-lookup
$ cd test-lookup
$ cat templates/deployment.yaml | head -n 6
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "test-lookup.fullname" . }}
  labels:
    {{ with lookup "v1" "Pod" "default" "nginx" }}testing: {{ quote .metadata.name }}{{ end }}

helm template --validate and helm install --dry-run both show empty output. helm install shows the correct output, of course. Perhaps #7969 unintentionally disabled the lookup function for cases where we do have cluster access.

What’s your thoughts on this, @technosophos?

I am the original contributor of this feature and it pains me to see that it’s still not usable in the “helm template” command. Lately I’ve been doing a lot of gitops with argocd and argocd uses the “helm template” command when dealing with helm charts. So, basically this means that if you use argocd you can’t use a chart that makes use of the “lookup” function. In other words the ecosystem of tools around helm is at times hurt by this choice of disabling the “lookup” function for the “helm template” command

On Wed, Jul 27, 2022 at 10:45 AM malmiteria @.***> wrote:

reading in this thread, it’s pretty clear to me helm core devs don’t give a f* about their user’s experience of helm, or even security tbh

It seems they acknowledge the use case, which is literrally “i don’t wanna code in production”, tho, tbh, I’m sure it’s a new one to no one It seems they agree there’s no actual security concern covered by having –dry-run not contact kubernetes api (from a quick read of this thread at least, correct me if i’m wrong) It seems they are aware the use case drive people to “unsafe” practices, as mentionned in multiple answers here, either roll back to previous helm version without that “security feature”, or to actually code in production, which makes full contact to kub’s api… #safetyFirstLol

This thread has been up for more than 2 years now. How can such a “simple rollback” take 2 years?

Honestly, this is a common pattern in this devops industry now, common use case prevented by overreaching / missplaced security concerns, that would inevitably lead users to “unsafe” practices. If security is a concern, why is that so common to drive your users to unsafe (by your standard) practice. Why is this still a thing?

— Reply to this email directly, view it on GitHub https://github.com/helm/helm/issues/8137#issuecomment-1196855904, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPERXGRV7K3E6UA4YXNHCDVWFDQZANCNFSM4NBWNILQ . You are receiving this because you were mentioned.Message ID: @.***>

– ciao/bye Raffaele

Neither help template --validate nor helm install --dry-run help in debugging a function that essentially is skipped.

The problem is that the lookup function should not simply be “skipped” if using helm template. Instead, it should handle the case where there is no active connection, and return a nil value so that those who wish to use the function still can. I shouldn’t have to make changes to my chart based on whether I’m going to use helm template or helm install to process it.

Please re-open this issue.

ofc this is no direct “problem of helm” but we’d be really happy to have dry-run work with lookups

I disagree, this is a problem of helm. (or we wouldn’t be talking on github/helm) They provide the feature, they have a responsibility to make it work

In the end, this is a user experience (UX) concern, which is completely foreign to most devops tool teams, but there are common golden rules and practices to follow, like consistency in features (if it looks the same, it behave the same), briefness (the most common use case should take the least amount of inputs) or guidance (through feedback in the gui for example, or more broadly, the ability of the tool to suggest its use to its users). Most of which should account for at least the few most common types of users (sometimes called persona), which includes users accustomed to the tool, users accustomed to similar tool, but not this one, total newcomers, maintainers, and so on.

dry run is a common name for a command to output to the console instead of wherever it would normally, so users accustomed to it in other tools come with some sort of expectation about its behavior, building up on that kind of expectation is a great way to reduce learning curves by a lot, and i’m assuming it was the case before the openning of this thread?

By having helm dry run behave the way it does today, helm makes it harder for everyone to use it. To tie it back to the golden rules :

  • this dry run fails at guidance : (at least some) users will believe that dry run will behave exactly like a not dry runned command, to the exception that it will output to the console. It doesn’t, hence the issue this thread is adressing
  • this dry run fails at consistency : most of the time, it will behave like intended, but if you try using lookup, then it wont anymore. This is particularly insane since if not for lookup, everything works fine, reinforcing the idea in every users that this dry run is like every other dry run, making it only harder to debug when you eventually have to face the issue.

I could add from your exemples that the toolchain breaking because of lookup is also a consistency issue.

And i’m pretty sure everyone here lost at least an hour of their lifes trying to understand why lookup was only getting empty results before realizing that it’s actually an intended behavior, and finding this thread.

A simple WARNING: dry run fakes all api calls, this can lead to inconsistent behavior with actual run of the command when running the command, maybe only when there’s actually api calls being faked, would prevent all of us hours of time loss… How am I the first one to think of that. I mean, it’s not a fix, it’s a quick fix, but damn…

The most ironic of all is that UX is actually what drove users to this tool (like most successful devops tool) in the first place : a tool that works and does what i need in a few inputs? nice

Its been a few months, any updates on getting the ability to render charts with lookup functions without applying them?

Since you can’t use values from a ConfigMap or Secret in k8s objects like Ingress, Helm’s lookup ability is pretty key in creating environment driven configuration for app deploys. The problem is the risk of upgrading live environments without the ability to test for unexpected changes (helm-diff).

Lookup is intentionally disabled for security reasons. There is no plan to change this behavior.

@technosophos I reviewed the security advisory. My take is that the documentation of helm’s behavior simply needs/needed to change. The security advisory fails to provide an actual description or example of how lookup can be used to disclose information the user didn’t already have access to with whatever credentials are configured on their system. Indeed, if helm template succeeds in contacting the cluster, the “attacker” could just dump then entire thing with kubectl or do any number of nefarious things. The vulnerability doesn’t make sense and the original behavior of lookup was correct.

Pretty sure the industry has moved on from helm by now XD

As a workaround you can debug the lookup function with the NOTES.txt file.

E.g.:

Lookup {{ .Values.namespace | quote }}:
{{ lookup "v1" "Namespace" "" ( .Values.namespace ) | toYaml }}

Then install the chart without --dry-run to get the output of the NOTES.txt file.
You can immediately delete the chart again after the installation like so:

helm install --debug lookup-test . -f /tmp/values.yaml && helm delete lookup-test

Repeat the process until your lookup call is working as expected.

If someone wants to PR support for --dry-run, we can review that.

Please let the issue drop about “what exactly can be done by an attacker.” We have disclosed as much as we are going to disclose.

This would be really helpful. Please support this.

@technosophos what exactly is the security concern with the lookup plugin? I could see there being a concern if we were talking about helm v2, where tiller was still involved, but with helm 3 being all in the client side, running a lookup through helm template doesn’t expose anything that the user couldn’t already see with kubectl get.

I second/echo @joejulian 's opinion on mimicking kubectl’s client/server dry-run behavior (comment on https://github.com/helm/helm/pull/9426 here)

In order to not change behavior, we need --dry-run by default to not contact any cluster. And following on from the security advisory/model GHSA-q8q8-93cv-v6h8, changing the behavior of --dry-run=true means e.g. users could suddenly get secrets logged in a CI system (see: https://github.com/helm/helm/issues/7275).

I would propose:

helm install|upgrade|delete|rollback --dry-run='none'|'client'|'server'|'true'|'false'

Where:

--dry-run= Description
none No dry-run, perform action (default)
client Don’t attempt to contact any cluster, same behavior as today
server Allow contacting the cluster for e.g. lookup
true Alias for client, for compatibility (to be removed helm v4)
false Alias for server, for compatibility (to be removed helm v4)

For helm template: currently the --dry-run flag appears to be ignored, and the code forces the value to True (ref). So similarly, for helm template the implementation would need to default to the equivalent of --dry-run=client. helm template --dry-run=server would enable contacting the cluster (and helm template --dry-run=false|none would continue to not make sense).

The implementation of the --dry-run flag would need to change from bool to string (it would need to be backwards compatible in the API too). And docs would need to be updated. Another advantage of the --dry-run='none'|'client'|'server' model (over mimicking kubectl), is that in the future, perhaps helm install --dry-run=server could do the same server-side validations that are available to kubectl user’s today.

reading in this thread, it’s pretty clear to me helm core devs don’t give a f* about their user’s experience of helm, or even security tbh

It seems they acknowledge the use case, which is literrally “i don’t wanna code in production”, tho, tbh, I’m sure it’s a new one to no one It seems they agree there’s no actual security concern covered by having --dry-run not contact kubernetes api (from a quick read of this thread at least, correct me if i’m wrong) It seems they are aware the use case drive people to “unsafe” practices, as mentionned in multiple answers here, either roll back to previous helm version without that “security feature”, or to actually code in production, which makes full contact to kub’s api… #safetyFirstLol

This thread has been up for more than 2 years now. How can such a “simple rollback” take 2 years?

Honestly, this is a common pattern in this devops industry now, that i’m sure we all faced multiple times : common use case prevented by overreaching / missplaced security concerns, that would inevitably lead users to “unsafe” practices. If security is a concern, why is that so common to drive your users to unsafe (by your standard) practice. Why is this still a thing?

Would it be opposed if we allow template --validate to allow use of the lookup function? I would be willing to work on this, but do not want to invest the time if it would be rejected once a pull request would be ready.

I think that template needs a way to enable the lookup function, so we can develop charts using functions like genSignedCert properly.

I tried my hand at enabling this functionality without changing helm template. I used the fact the APIVersions will only not be nil for helm template. Unless there’s an edge case I am not thinking of, I believe this should work. I took this approach to minimize the number of files and lines that had to be changed.

For those wanting a workaround, kubectl --context cluster-name get kind object -o jsonpath='{.metadata.name}' which feeds into a --set or values.yaml etc., should do you, but nowhere near as simple as just using lookup…

@bacongobbler I get what I have to do to handle the lookup in my template. What I’m saying is that when I run helm install with my template and lookup actually returns something, I have no way of seeing the content that is generated and therefore can’t troubleshoot the formatting issues, and helm install --dry-run gives me no extra information.

Normally, I would run helm template to see what’s generated, but in this case, I can’t.

What you can do is:

{{- lookup "v1" "pods" "" "" | toString | fail}}

Which should fail with a message containing the result of the lookup.

They provide the feature, they have a responsibility to make it work.

Helm, like most open source tools, is a community of users, developers, volunteers, etc. You are part of this community and as such when you say “they” that them includes you. Please consider spending some time on this. If you don’t have the experience to contribute code, consider hiring someone who does and donating. Maybe trade in-kind your skills for someone with the skills and time to generate the feature you’re interested in. We’re all in this together.

LoL. Read the thread. People have tried all that. Even proposed patch sets. Maintainers don’t care. Thats who they is.

How about using helm install --dry-run=server-lookup-only for current behavior and reserve helm install --dry-run=server for future “all server-side validations and majic” ?

This was discussed for two years before being merged. Let’s just celebrate that something was agreed upon 😀

@.gjenkins8 what do you think about this modification to be more consistent :

string param

Using as kubeclt a string option ‘client|server’

option Description [none] No dry-run, perform action (default) --dry-run='none' No dry-run, perform action (default) --dry-run='false' No dry-run, perform action (default) --dry-run Don’t attempt to contact any cluster, same behavior as today --dry-run='true' Don’t attempt to contact any cluster, same behavior as today --dry-run='client' Don’t attempt to contact any cluster, same behavior as today --dry-run='server' Allow contacting the cluster for e.g. lookup

As an update in this thread, I updated my existing PR (https://github.com/helm/helm/pull/9426) to follow the above talked about pattern. With this implementation it should be backwards compatible. Any feedback or next steps to get the issue merged would be great.

Maybe --dry-run could be kept the same, and a new option added, something like --allow-lookups?

I agree that is frustrating. The best I can offer is having the contributor come to the community developer meeting and ask for reviews. It’s far from perfect and I, as a volunteer who’s trying to help the community in the only way I have time for, feel helpless about it.

That said, it’s still a “we”. Come join the meetings. Participate in the mailing list or slack channel. Test PRs and offer reviews. The more evidence we can supply a core maintainer, the less time they need to spend on a review and the more they can get done. Every core maintainer has a job and none of those jobs pay for folks to develop helm full-time. Come meet us. We’re nice folks that are trying our best to help.

They provide the feature, they have a responsibility to make it work.

Helm, like most open source tools, is a community of users, developers, volunteers, etc. You are part of this community and as such when you say “they” that them includes you. Please consider spending some time on this. If you don’t have the experience to contribute code, consider hiring someone who does and donating. Maybe trade in-kind your skills for someone with the skills and time to generate the feature you’re interested in. We’re all in this together.

I am the original contributor of this feature and it pains me to see that it’s still not usable in the “helm template” command. Lately I’ve been doing a lot of gitops with argocd and argocd uses the “helm template” command when dealing with helm charts. So, basically this means that if you use argocd you can’t use a chart that makes use of the “lookup” function. In other words the ecosystem of tools around helm is at times hurt by this choice of disabling the “lookup” function for the “helm template” command – ciao/bye Raffaele

I can only second that - we’re at the verge of dropping ArgoCD just because we’re not fully compatible with helm because of the missing lookup function. A deeply concerning problem is that many users probably start with GitOps & ArgoCD and are happy at the beginning but later on will face a helm chart using lookups and then have to change their whole toolchain again. Even worse when some vendor adds lookups at later versions of a helm chart and the existing installation cannot be updated anymore. ofc this is no direct “problem of helm” but we’d be really happy to have dry-run work with lookups - this way we could default to the standard “helm template” in Argo and switch to “helm install --dry-run” just for the few charts requiring lookups.

I gave an answer: If someone wants to add support for --dry-run, they are free to do so. If this issue goes stale and falls off the radar with no PR… then that’s fine. But I’m not marking it “keep open” if nobody is going to work on it.

@technosophos can you please take a look at PR #9426 that was opened in Feb to address the --dry-run lookup issue?

As requested work was done on this issue. Will someone have a look at this PR to get it in?

dry-run is not expected to interact with the cluster? I’m sorry but I looked really hard in the documents, I just couldn’t find anywhere what you stated. if anything I found this:

helm install --dry-run --debug or helm template --debug: We've seen this trick already. It's a great way to have the server render your templates, then return the resulting manifest file. if you could give me a reference of what you saying it will be awesome.

I even went further and tried it to see im not confused:

$ helm3 install prom-push-gw . -f values.yaml --dry-run
Error: Kubernetes cluster unreachable: Get https://some-cluster/version?timeout=32s: dial tcp: lookup some-cluster on 8.8.8.8:53: no such host

so it’s either you wrong here, another bug in helm or maybe thought I was talking about --dry-run with helm template command?

and even if you are right about it, I still don’t see why it shouldn’t work with helm template --validate option.

@gjenkins8 what do you think about this modification to be more consistent :

string param

Using as kubeclt a string option ‘client|server’

option Description
[none] No dry-run, perform action (default)
--dry-run='none' No dry-run, perform action (default)
--dry-run='false' No dry-run, perform action (default)
--dry-run Don’t attempt to contact any cluster, same behavior as today
--dry-run='true' Don’t attempt to contact any cluster, same behavior as today
--dry-run='client' Don’t attempt to contact any cluster, same behavior as today
--dry-run='server' Allow contacting the cluster for e.g. lookup

As I sayed before I think it will confuse new users as the kubeclt --dry-run does not do the same.

two options

Adding another option to connect to the cluster, like --cluster-validation or --allow-cluster-connection.

dry-run option other option Description
[none] [none] No dry-run, perform action (default)
--dry-run [none] Don’t attempt to contact any cluster, same behavior as today
[none] --cluster-validation Error : --cluster-validation flag can only be used with the --dry-run flag
--dry-run --cluster-validation Allow contacting the cluster for e.g. lookup

This can be a viable option as it just add a new option and do not break anything.

After thinking of it, it diverge from the main issue subject… If you want @technosophos I can create a new issue where I describe more properly the --dry-run actual issues and where we can find a reliable, safe and user friendly behavior.

I follow the @gjenkins8 comment, dry-run comes from the kubectl command that allow us to run some commands with dry run. From the kubectl docs :

APIServer dry-run was implemented to address these two problems:

  • it allows individual requests to the apiserver to be marked as “dry-run”
  • the apiserver guarantees that dry-run requests won’t be persisted to storage
  • the request is still processed as typical request: the fields are defaulted, the object is validated, it goes through the validation admission chain, and through the mutating admission chain, and then the final object is returned to the user as it normally would, without being persisted

Source

Any dev / ops that touch helm comes from kubectl. When we see something that claims to “Intelligently manage your Kubernetes manifest files” that uses an option with the same name than the official one (kubernetes/kubectl) we obviously think (before reading the docs) that it will do the same / pass the dry-run params to kubectl… This is not the actual behavior and the biggest problem is that it is not well documented : With only simulate an install, we are even more tempted to believe that the implementation comes down to being a wrapper of the kubectl --dry-run command.

Good or bad news, is that the --dry-run param is deprecated :

$ kubectl create ... --dry-run
W1009 17:00:14.998329  323115 helpers.go:553] --dry-run is deprecated and can be replaced with --dry-run=client.

It as been added on the 1.18 version of kubectl, with #85652 on June 16, 2021.

Now the official docs of the dry run say the following info :

Must be “none”, “server”, or “client”. If client strategy, only print the object that would be sent, without sending it. If server strategy, submit server-side request without persisting the resource.

Source

If we chose to implement @gjenkins8 solution, we will still diverge from kubectl. This is not a bad thing by design but it must be specifically explained with a message in the helm help. Something like :

-       --dry-run                                    simulate an install
+       --dry-run                                    simulate an install. NOT EQUIVALENT TO KUBECTL --DRY-RUN, READ https://...

In my mind we must follow the kubectl CLI workflow if we uses the same option name. Today the dry-run option does not help me at all, the elephant in the room is obviously the lookup function that does not work as expected but if you add some custom validators to your charts, like refuse to run a pod with root user, or force resources limits … All of your CI with --dry-run will work and only at the prod deployment the chart will be refused by your cluster because of the not compliant rules that has not been checked due to no cluster connection… When lookup is more a UX problem this previous error can cause some production errors due to no possibility to check it in another way than do an helm upgrade and cross fingers.

I totally understand the @technosophos security POV, and yes, connecting to the cluster can pose sometimes some problems. But as warn by security best practices (I’m french so I’m referring to the ANSSI), sometimes forcing users to have best practices (like changing password once by 3 month) leads to less security (everyone put a post-it under it’s keyboard). Here we see a lot of workaround like shell scripts that do the lookup before, then put results in the values file, then running the helm upgrade --dry-run. This is bad, yes, because this shell script can do a lot more (like adding or deleting stuff). But while the lookup can not be used to test the chart we will se this type of behavior.

Finally, I think implementing a kubectl like behavior is the best solution :

  • It reduce the friction to adopt helm
  • We will instantly benefit from the kube new implementations / upgrade of the dry-run behavior
  • It will lead to less bad practices of users that want lookup
  • It allows us to validate the chart t the kube api server before release

If you want to add some info if I miss something, I’m open to any suggestions. I’m not really confident in Go but I will be happy to contribute if a PR is open for this. I’ve seen #9797 and #9426 but they are more code examples to real approved behavior.

@technosophos can you please take a look at PR #9426 that was opened in Feb to address the --dry-run lookup issue?

This is blocking deployments for me also since upgrading argo which uses the newer versions of helm. I still don’t understand how using lookup is any different from using kubectl get as it exposes the same amount of information. If you have a valid kubeconfig or credentials the ability to get to sensitive data is the same.

I came across another interesting side effect earlier this week: we’ve been exploring https://tilt.dev/ which is a tool that augments traditional Kubernetes deployments to provide an optimized developer experience. It supports deploying software via helm charts, but needs to apply its own modifications to the rendered config prior to installation in the cluster. So it uses helm template to render the config then proceeds to modify and apply it. Because we use lookup, Tilt does not work with our helm charts. Main point here is we’ve come across a flow where it’s not just about an inability to debug a chart. This issue prevents other tools that process rendered charts from working with helm charts that call lookup.