my web application is running as a Kubernetes pod behind an nginx reverse proxy for SSL. Both the proxy and my application use Kubernetes services for load balancing (as described here).
The problem is that all of my HTTP request logs only show the internal cluster IP addresses instead of the addresses of the actual HTTP clients. Is there a way to make Kubernetes services pass this information to my app servers?
You can get kube-proxy out of the loop entirely in 2 ways:
Use an Ingress to configure your nginx to balance based on source ip and send traffic straight to your endpoint (https://github.com/kubernetes/contrib/tree/master/ingress/controllers#ingress-controllers)
Deploy the haproxy serviceloadbalancer(https://github.com/kubernetes/contrib/blob/master/service-loadbalancer/service_loadbalancer.go#L51) and set the balance annotation on the serivce so it uses "source".
As of 1.5, if you are running in GCE (by extension GKE) or AWS, you simply need to add an annotation to your Service to make HTTP source preservation work.
...
kind: Service
metadata:
annotations:
service.beta.kubernetes.io/external-traffic: OnlyLocal
...
It basically exposes the service directly via nodeports instead of providing a proxy--by exposing a health probe on each node, the load balancer can determine which nodes to route traffic to.
In 1.7, this config has become GA, so you can set "externalTrafficPolicy": "Local"
on your Service spec.
Click here to learn more
Right now, no.
Services use kube_proxy to distribute traffic to their backends. Kube-proxy uses iptables to route the service IP to a local port where it is listening, and then opens up a new connection to one of the backends. The internal IP you are seeing is the IP:port of kube-proxy running on one of your nodes.
An iptables only kube-proxy is in the works. That would preserve the original source IP.
As of Kubernetes 1.1, there is an iptables-based kube-proxy that fixes this issue in some cases. It's disabled by default; see this post for instructions for how to enable it. In summary, do:
for node in $(kubectl get nodes -o name); do kubectl annotate $node net.beta.kubernetes.io/proxy-mode=iptables; done
In the case of Pod-to-Pod traffic, with the iptables kube-proxy you will now see the true source-IP at the destination pod.
However, if your Service is forwarding traffic from outside the cluster (e.g. a NodePort, LoadBalancer service), then we still have to replace (SNAT) the source-IP. This is because we are doing DNAT on the incoming traffic to route it to the the service Pod (potentially on another Node), so the DNATing Node needs to insert itself in the return path to be able to un-DNAT the response.
For non-HTTP requests (HTTPS, gRPC, etc) this is scheduled to be supported in Kubernetes 1.4. See: https://github.com/kubernetes/features/issues/27
externalTrafficPolicy: Local
Is a setting you can specify in the yaml of Kubernetes Services of type Load Balancer or type NodePort. (Ingress Controllers usually include yaml to provision LB services.)
externalTrafficPolicy: Local
Does 3 things:
1. Disables SNAT so that instead of ingress controller pod seeing source IP as the IP of a Kubernetes Node it’s supposed to see the real source IP.
2. Gets rid of an extra network hop by adding 2 rules:
-if traffic lands on nodeport of node with no ingress pods it’s dropped.
-if traffic lands on nodeport of node with ingress pods it’s forwarded to pod on the same node.
3. Updates Cloud Load Balancer’s HealthCheck with a /healthz endpoint that’s supposed to make it so the LB won’t forward to nodes were it would have been dropped, and only forward to nodes with ingress pods.
(Rephrasing for clarification sake: by default aka "externalTrafficPolicy: Cluster", traffic gets loadbalanced between the NodePorts of every worker node. "externalTrafficPolicy: Local" allows traffic to only be sent to the subset of nodes that have Ingress Controller Pods running on them. So if you have a 100 node cluster, instead of the cloud load balancer sending traffic to 97 nodes, it'll only send it to the ~3-5 nodes that are running Ingress Controller Pods)
Important Note!:
"externalTrafficPolicy: Local" is not supported on AWS.
(It supposedly works fine on GCP and Azure, that being said I also recall reading that there was a regression that broke it in a minor version of Kubernetes 1.14 + there were some versions of Cilium CNI were it breaks as well, so be aware that the default externalTrafficPolicy: Cluster is rock solid stable and should usually be preferred if you don't need the functionality. also be aware that if you have a WAF as a Service in front of it anyways then you may be able to leverage that to see where client traffic is coming from.)
(It causes issues with kops and EKS, other distros running on AWS might be unaffected actually, more on that below.)
"externalTrafficPolicy: Local" not being supported on AWS is an issue that's known by the Kubernetes maintainers, but not well documented. Also, an annoying thing is that if you try it you'll have some luck with it/it'll appear to be working and this tricks enough people into thinking it works.
externalTrafficPolicy: Local is broken in 2 ways on AWS and both breaks have workarounds to force it to work:
1st break + workaround: /healthz endpoint initial creation is flaky + reconciliation loop logic is broken.
Upon initial apply it’ll work for some nodes not others and then never get updated.
https://github.com/kubernetes/kubernetes/issues/80579
^describes the problem in more detail.
https://github.com/kubernetes/kubernetes/issues/61486
^describes a workaround to force it to work using a kops hook
(When you solve the /healthz endpoint reconciliation loop logic you unlock benefits #2 and #3 less hop + LB only sending traffic to subset of worker nodes. but, benefit #1 source IP still won’t be right.)
2nd break + 2 workaround options:
The desired end result is an ingress pod seeing true client’s IP.
But what really happened is ingress pod shifted from seeing source IP of k8s node, to seeing the source IP of the Classic ELB.
workaround option 1.)
Switch to network LB (L4 LB) that works more like Azure LB. This comes at the cost of not being able to use ACM (AWS Certificate Manager) to terminate TLS at the AWS LB / handle TLS cert provisioning and rotation for you.
workaround option 2.)
Keep using AWS classic ELB, (and you get to keep using ACM), you’ll just need to add configuration to both the classic ELB (in the form of annotation of the LB service) + add configuration to the ingress controller. So that both use proxy protocol or x forward headers, I recall another Stack Overflow post covering this, so I won't repeat it here.
For kubernetes 1.7+ set service.spec.externalTrafficPolicy
to Local
will resolve it.
More information here: Kubernetes Docs