How Do You Troubleshoot OOM Errors with Restarting Pods in Kubernetes?

Problem scenario
You find out-of-memory (aka OOM) errors in Kubernetes. The node seems healthy. The pods keep restarting. What should you do?

Possible Solution #1
Are you using a burstable quality of service (aka QoS) with the pods? There may be a non-pod-specific reason and the burstable Pods are being starved of resources. The node may be getting requests for other pods with a superior quality of service. If you change the other Pods' QoS or the QoS for the Pods being restarted, that may help.

Possible Solution #2
Is the node actually not healthy? If the memory is having a hardware problem, you may think the node is fine, but there is a small chance that the node is in fact not healthy.

Possible Solution #3
Can the restart rate of the pod be changed? It may be set to a high frequency. This could be a contributing factor to the restarts themselves.

Leave a comment

Your email address will not be published. Required fields are marked *