kubernetes: kubelet's MASQUERADE support is insufficient
Today kubelet sets up an iptables MASQUERADE rule for any traffic destined for anything except 10.0.0.0/8. This is close, but not even correct on GCE, and certainly not right elsewhere.
First GCE. We probably want something like:
iptables -t nat -N KUBE-IPMASQ
iptables -t nat -A KUBE-IPMASQ -d 10.0.0.0/8 -j RETURN
iptables -t nat -A KUBE-IPMASQ -d 172.16.0.0/12 -j RETURN
iptables -t nat -A KUBE-IPMASQ -d 192.168.0.0/16 -j RETURN
iptables -t nat -A KUBE-IPMASQ -j MASQUERADE
iptables -t nat -I POSTROUTING -j KUBE-IPMASQ
This catches all traffic to RFC1918 ranges and masquerades it. We can probably optimize with CONNMARK or something so we only consider packets from containers. This is probably still imperfect, but better, for lack of project-wide NAT for egress.
For other environments, we really have no idea what the correct policy for this is. It is closer to “your nodes must handle this” than “we can handle this for you”. It’s debatable whether we should even try.
Either: a) We teach kubelet a lot more and let people pass flags to nearly=arbitrarily configure this b) We tell people to configure this as part of their node setup
This popped up when I realized GKE allows users to set up 172.* clusters - any traffic between containers in one of these will get masqueraded - not correct behavior!! This is not a huge deal right now because kube-proxy has the same effect when traversing services. As we fix kube-proxy in the wake of 1.0, masquerade will be a bigger deal, especially for micro-segmenting.
Additional considerations: VPNs have bizarre and very custom needs. Every such thing has an as yet unmeasured perf implication. This also pops up in our GCE firewall thing that @ArtfulCoder is working on.
About this issue
- Original URL
- State: closed
- Created 9 years ago
- Comments: 22 (17 by maintainers)
Commits related to this issue
- Allow non-masquerade-cidr to be passed to the kubelet Removing the hard-coding of 10.0.0.0/8 Issue #11204 — committed to justinsb/kubernetes by justinsb 8 years ago
- Merge pull request #46473 from thockin/enable-masq-agent-gce Automatic merge from submit-queue (batch tested with PRs 46501, 45944, 46473) Enable the ip-masq-agent on GCE installs Setting this will... — committed to kubernetes/kubernetes by deleted user 7 years ago
On reflection, I like the daemonset best
@thockin pls let us know if you are working on proposal or any working solution
humm, this is still a hard one to express properly. In a way, I’d most like to get rid of this rule entirely and let people run whatever they need to set up the rules. In practice, though, this would end up with just about every node running a daemonset that simply programs iptables. It would, however, allow people to opt-out totally, or do any sort of mad thing. Thoughts?
Kube 1.4 introduces a config-map for every kubelet to read params. Maybe we could latch onto that to deprecate the old flag and replace it with a new (compatible) flag that takes a list of destination-CIDRs to not-masquerade, and make explicit behavior to disable masquerade at all. Can someone step up as a champion for this and draft a spec and proof-of-concept implementation? @mtaufen
@jnardiello I don’t know what would have changed the flag for you - this logic should only respond to the
--non-masquerade-cidr
flag.