Kubelet Complains Orphan Pod

This error is derived from a P1 incident which itself originated from a emerging Google regional network issue. 3 gke nodes in that region were impacted, the kubelet daemon in each node was not able to update node status and other metrics. The node was changed to NotReady and some customer services went down.

After the nodes went back to ready state as the root cause mitigated, from GCP Log Explorer one gke node kubelet kept posting error:

1
E0505 03:59:46.470928    1543 kubelet_volumes.go:154] orphaned pod "7935559a-a41b-4b44-960b-6a31fcab81f8" found, but volume paths are still present on disk : There were a total of 2 errors similar to this. Turn up verbosity to see them.

Concept

The solution for this error is simple but let’s understand some concepts: Why we have orphan pods, see kubernetes Garbage Collection:

Some Kubernetes objects are owners of other objects. For example, a ReplicaSet is the owner of a set of Pods. The owned objects are called dependents of the owner object.

If you delete an object without deleting its dependents automatically, the dependents are said to be orphaned.

In background cascading deletion, Kubernetes deletes the owner object immediately and the garbage collector then deletes the dependents in the background.

The default behavior is to delete the dependents in the background which is the behavior when --cascade is omitted or explicitly set to background.

Solution

As we can see, after the nodes was ready, the old pod controllers were deleted and new ones got redeployed, however, some orphan pods hung there because of pod volume, this is a known issue

On host node, cd to /var/lib/kubelet/pods path and manually deleted them.

1
2
cd /var/lib/kubelet/pods
rm -rf ./7935559a-a41b-4b44-960b-6a31fcab81f8/
0%