Problem scenario
You have a Pod that you want to run on a worker node. Its status is "Pending." What can you do to get the Pod to proceed (and what are some common root causes for this problem)?
Background
A common root cause is that the nodes have insufficient resources for the Pod (e.g., insufficient CPU or memory).
Possible Solution #1
Wait. It may eventually get scheduled to a worker node when there are sufficient nodes available.
Possible Solution #2
Can you change or remove a taint on a Pod? Can you add a toleration to the relevant YAML file for deploying this pod? Taints without tolerations can prevent nodes from being available to power a Pod. If the pod is a stateful set, it may need a certain type of disk (e.g., solid-state disk); therefore if the nodes have older hard drives, they won't be available for the scheduling. Is this possibly your problem? If a PersistentVolumeClaim is involved, the PVC may need to be deleted (according to this external posting).
Possible Solution #3
Can your other deployments/jobs have lower requests values for CPU and memory? Can your other deployments/jobs have lower limits values for CPU and memory?
Possible Solution #4
Are you using hostPort? Stackify.com says that this can contribute to a Pod reminaing in "pending" state. They say "[b]inding a pod to hostPort means limited areas for scheduling. However, it’s pointless using a service object to expose the pod.'
Possible Solution #5
Can you add a server as a worker node to the cluster? Or can you add CPU (see this pOsTiNg to resize a VM in the public cloud) or RAM (see this posting) to the node?
Possible Solution #6
Run this command:
kubectl get pods --field-selector=status.phase=Pending
Can you follow that up with a "kubectl describe pod foobar" (where "foobar" is the name of the Pod you want to investigate)? (This solution was adapted from this posting on the Datadog website.)
Possible Solution #7
Can you look in Splunk, Datadog or your centralized logging device to see if there is a "FailedScheduling" event that may provide details, hints or clues as the to problem.
Idea adapted from the Datadog website.
Possible Solution #8
When you "specify nodeSelector and no node matches label then pods are in pending state as they don't find node with matching label." (This quote was taken from https://www.bigbinary.com/blog/scheduling-pods-on-nodes-in-kubernetes-using-labels.)
You can remove a "nodeSelector" and "nodePool" stanza lines from the YAML file or you can label nodes with a "kubectl label nodes foo nodePool=barcluster" (where the node label is "foo" and the nodePool is named "barcluster"). This solution was adapted from StackOverflow.
Possible Solution #9
Are there affinity rules in place?
Anti-affinity rules can cause this problem because Pods will attempt to avoid certain nodes; to learn more about one such situation, see this posting: https://github.com/hashicorp/consul-helm/issues/243
Possible Solution #10
Could one of your nodes be not ready? Can you run "kubectl get nodes"? If some are not ready, see this posting.
Possible Solution #11
Is a deployment or rolling update happening? If one is and there a new PodDisruptionBudget was implemented, that may constrain resources available for scheduling. To learn more, you may want to see this:
https://kubernetes.io/docs/tasks/run-application/configure-pdb/#specifying-a-poddisruptionbudget
If you know the name of the deployment, to see the status of it and learn more details about how it is progressing, run this command (but substitute "foobar" with the name of the deployment in question):
kubectl describe deployment foobar
Possible Solution #12
Can you run "kubectl get events" or "kubectl logs"? These may help you determine the root cause.
Possible Solution #13
Are you 100% sure you are looking at the right pod? If you have multiple namespaces, the pod could be in an incorrect namespace. Have you run "kubectl get pods --all-namespaces" to be sure you are looking at the correct Pod?