kube2iam: First run during deployment produce 404 error
Kube2iam version: 0.8.1 Kubernetes version: 1.8.4 CNI: weave 2.0.5
Only during deployment we are experiencing an error retrieving aws credentials with boto, the pod crashes and is restarted, then works perfectly.
This is the log line from kube2iam during the error
GET /latest/meta-data/iam/security-credentials/ (404) took 3116998023 ns
App error log
botocore.exceptions.NoCredentialsError: Unable to locate credentials
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (1 by maintainers)
we wrote a shell script that runs as part of our run command:
once kube2iam returns 200 OK, the rest of the run command proceeds. Note that this has implications for local development, since presumably you won’t have kube2iam running on developers’ laptops when they run these containers.
but the real solution is to switch to kiam which does not have this kube2iam race condition
With kiam you have to run some dedicated machines (> 1 in order to avoid a single point of failure) just to run the kiam server component, so it’s not a drop-in replacement for kube2iam (since kiam requires more infrastructure than kube2iam).
This does seem to be the same type of timing issue that we have seen before. You probably pieced this together, but in your 2nd example, the Kube2IAM watch has not received notification that the Pod has become running (which is when we get the IP and cache the
IP -> Role) until after you have made two failing calls.There are a few events that Kube2IAM is registering in your example, but if you notice the
pod.status.ipis nil until theOnUpdateevent @time="2018-05-31T15:08:59Z". Your calls to Kube2IAM will not return anything until then.I don’t know how, in it’s current form, we could give people both things they need:
A much bigger solution that has floated around would be to augment the “Pod based” IAM approach and augment it with a “Service Account” based IAM approach. This would solve part of this problem potentially, because we can see the Pod’s service account before it ever gets an IP so we could start the STS handshake and caching of credentials even if we hadn’t gotten the
OnUpdateevent with thePhase == Running.I don’t want to “Fork” the repo because I feel like @jtblin deserves to get credit for the good work he’s put into this and the very crafty idea, but if there isn’t any updates on the #132 I will probably just start working and publishing something myself