How Do You Troubleshoot “Expected x selector labels” with Kubernetes?

Problem scenario
You run a kubectl command on a YAML (.yml) file. But you get “Expected X selector labels”. What do you do?

Solution
Go to the “labels” and “matchlabels” sections of corresponding YAML files. Do you see “app:” or “environment:” stanzas? Are these consistently used in your YAML files? This error can happen if the YAML files are inconsistent with their “app:” and “environment:” stanzas.

How Do You Troubleshoot “timed out waiting for condition” when Running kubectl Commands?

Problem scenario
You ran a kubectl command, but you got an error “timed out waiting for condition”. It never worked or suddenly stopped working. What should you do?

Possible Solution #1
Use the -v=9 flag. This will maximize the verbosity of the kubectl command’s results.

Possible Solution #2
See why the pod is not being created if it uses a Persistent Volume Claim.

How Do You Troubleshoot the error “operator-sdk: cannot find module for path embed”?

Problem scenario
You run “make install” and get an error like this:

build github.com/operator-framework/operator-sdk/cmd/operator-sdk: cannot find module for path embed
make: *** [Makefile:65: install] Error 1

What should you do?

Solution
1. Edit the go.mod file. Replace this line:

module github.com/operator-framework/operator-sdk/

with this line

module github.com/operator-framework/operator-sdk/tree/master/ …

How Do You Fix “Failed to start docker.service. Unit docker.socket is masked”?

Problem scenario
You try to start the Docker service (e.g., with sudo systemctl start docker). You get this error: “Failed to start docker.service. Unit docker.socket is masked”

What should you do?

Possible solution #1

sudo systemctl unmask docker.socket
sudo systemctl unmask docker.service

This solution was adapted from this blog.

Possible solution #2
Uninstall Docker.

How Do You Troubleshoot the Kubernetes Pods Error “CrashLoopBackOff”?

Problem scenario
You have a problem with Pods not starting up. They have a status of “CrashLoopBackOff”. What should you do?

Possible Solution #1
Can the Pods start in a lower environment with minimal utilization? Being able to reproduce the problem can help. Trying the YAML in an optimal environment can help you pinpoint the error (e.g., traffic may be high in a production environment).

How Do You Troubleshoot a PVC That Appears to Still Be in Use?

Problem scenario
You are working with Kubernetes. A PVC cannot be mounted. It seems like the Persistent Volume is still in use. What should you do?

Possible Solution #1
Find out about the volume attachments with these commands:

kubectl get volumeattachment
kubectl get pv

Find the Pods that use it. Is unmounting working? Perhaps the unmounting process has a problem.

How Are Backoff Strategies (with Client Retries) Helpful?

Question
Sometimes a client attempts to connect to or use an application. Sometimes a Kubernetes Pod is being created and tries to pull down an image. Sometimes a network device tries to establish a connection to an endpoint. These attempts can initially fail. Retries can be attempted in rapid succession. To mitigate excessive attempts in a short amount of time (to not waste resources or cause a denial-of-service attack),