Question
You have been asked to explain how an external request for a website is routed and ultimately fulfilled by a pod in Kubernetes via an ingress controller. When a web browser downloads a website via HTTP and the website is running from a Kubernetes cluster, how does an individual container provide the HTML and/or data? How is external traffic routed to an underlying pod in a Kubernetes Cluster?
Answer
Assuming that external traffic can reach an ingress controller in Kubernetes, the short version of the answer may be the way ingress controllers are configured. (The long answer, or any answer, should account for variation in the process as there is not one way traffic is routed to a pod in a Kubernetes cluster.)
There are three types of HTTP routing: host-based, path-based and header-based (according to https://dzone.com/articles/the-three-http-routing-patterns-you-should-know). We think header-based routing with Kubernetes is exceptionally rare based on this SO posting. If you use an ALB with Kubernetes, you can use host-based routing (according to this Amazon posting).
A browser will make a GET request for a URL. The data will be retrieved via a DNS server based on the top-level domain (e.g., .com of the URL). The request will be routed to the appropriate domain (e.g., continualintegration). The next intermediate step would be to land on a load balancer or reverse proxy.*
External traffic will then go to either an Ingress or a Service of Kubernetes. Services pass the traffic to an Endpoints -- not to pods directly (according to page of 325 of Kubernetes in Action). (Technically the Endpoints resource should be plural.) Endpoints are separate resources (as opposed to being subcomponents of Services) according to page 133 of Kubernetes in Action.**
Ingress resources may use services without forwarding the traffic to the Service; Ingress controllers select pods to fulfill HTTP requests (according to page 145 of Kubernetes in Action). Ingress controllers can expose external traffic to underlying Kubernetes services themselves or send external traffic directly to pods and bypass Services (pages 143 through 146 of Kubernetes in Action and this). (An ambassador can also send traffic directly to pods and bypass kube-proxy on a worker node according to this external site.) Kubernetes Services can be available for external traffic without an Ingress forwarding such traffic (page 140 of Kubernetes in Action). To read about the advantages of a Service over an Ingress resource, you may want to view this posting.
Ingress controllers can be externally accessible with a public IP address (without a load balancer or reverse proxy). DNS can facilitate the resolution of an FQDN. During a request to a website over HTTP, once the request gets to the ingress controller, the traffic will be directed to a service either via a label selector or an IP address.
Assuming ingress controllers are used in Kubernetes, and assuming there was YAML in their creation, the "rules" stanzas in the "spec" section of the YAML will designate host stanzas (for routing to subdomains) and corresponding specific paths which map to backend services (a list of one or more) that are the target "services" that will fulfill the request.
Here is a YAML example for an ingress controller with label selectors that will route traffic destined for certain hosts (either continualintegration.foobar.com or weird.continualintegration.com) to corresponding services (either service1 or service2).
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: example-host
spec:
rules:
- host: "continualintegration.foobar.com"
http:
paths:
- pathType: Prefix
path: "/bar"
backend:
service:
name: service1
port:
number: 80
- host: "weird.contintualintegration.com"
http:
paths:
- pathType: Prefix
path: "/foo"
backend:
service:
name: service2
port:
number: 80
Services can either have labels or IP addresses. It is possible to create a Service with no valid Endpoints resource (as explained here), but it would likely not be useful. Endpoints are separate Kubernetes resources (according to page 133 of Kubernetes in Action). Endpoints can be created manually if you do not create Services with labels. Routes to Endpoints can be done via IP addresses or label selectors that a Kubernetes service is configured with (pages 102 and 105 of Kubernetes Patterns by Bilgin Ibryam and Roland Huß (O'Reilly). Copyright 2019 Bilgin Ibryam and Roland Huß, 978-1-492-05028-5). Endpoints define what will be exposed in a pod (according to the front cover of Kubernetes in Action).
Internal networking in a Kubernetes cluster (e.g., among the worker nodes) is done via a Container Network Interface plugin (as opposed to Network Address Translation).
Load balancers can send traffic to different endpoints; load balancers require hardware to implement (according to page 114 of Kubernetes Patterns by Bilgin Ibryam and Roland Huß (O'Reilly). Copyright 2019 Bilgin Ibryam and Roland Huß, 978-1-492-05028-5). Pods can become available or unavailable based on readiness probes. kube-dns and kube-proxy are components that play a role in internal traffic routing inside a Kubernetes cluster. kube-proxy on each worker node helps balance the traffic load among the containers providing a given service (page 21 of Kubernetes in Action).
To read more about networking and Kubernetes, see this external page or this Dzone page. To learn more about how internal Kubernetes traffic, see this posting. For networking specific to GKE specifically see this Google documentation posting.
Further reading:
If Kubernetes is using a service mesh, then many aspects of the routing may be very different; see this posting for more details. You may also want to see these postings: Itnext.io or blog.getambassador.io.
To learn more about how a workstation connects to a website so a user can browse it, see this posting.
You may want to read these external pages for greater knowledge:
- https://www.cisco.com/c/en/us/products/collateral/cloud-systems-management/intersight/comp-guide-kubernetes-networking-wp.html
- https://kodekloud.com/certified-kubernetes-administrator-exam-series-part-8-networking/
If you are asked in an interview "how does Kubernetes networking work?", we think you should mention three things: 1) kube-proxy on the worker nodes plays a big role for load distribution and/or routing traffic to Pods 2) the YAML file for creating the Pods will have a rules/spec stanzas will have a "paths:" section (for text like /foo or /foo/bar) and this ensures the paths of the URL route to different URLs 3) a CNI plugin does internal networking for the cluster.
* If no FQDN is used, there will be no DNS resolution, but the web browser may download a page. If an IP address is used in the address bar of the web browser, the IP address via a router's routing tables will be looked up. Naturally a web request involves typing in a URL (or IP address) on a web browser. This request will be resolved by a TLD (e.g., if the URL had a .com, .org, or .gov in it). The HTTP request via the originating web browser after it gets to the TLD is then routed to a website (possibly a reverse proxy). The resolution could happen via Route53 (if the DNS name was registered in AWS) to an ELB. The reverse proxy or ELB in these previous sentences could direct the traffic to an ingress controller created in Kubernetes. Ultimately the packets from an underlying pod would be sent back to the requesting web browser. After the routing happens packets will be assembled in the web browser. When refreshing a web browser (as opposed to issuing a curl), you will normally get the web page from the same exact pod every time because "Services work at the connection level" (according to page 140 of Kubernetes in Action).
** One type of Service that routes traffic is NodePort. This method bypasses security features of Kubernetes (according to inext.io website). NodePort is best for non-HTTP traffic.