Kubernetes sysctl

Kubernetes version 1.13.2

In my article <<Linux IPC>>, I mentioned that there is a workaround to set IPC kernel parameters using sysctl in Kubernetes cluster if SYS_RESOURCE is not allowed.

Clarification

From the Kubernetes document, we see:

Sysctls are grouped into safe and unsafe sysctls. This means that setting a safe sysctl for one pod:

  • must not have any influence on any other pod on the node
  • must not allow to harm the node’s health
  • must not allow to gain CPU or memory resources outside of the resource limits of a pod.

By far, most of the namespaced sysctls are not necessarily considered safe (please check latest Kubernetes document to figure out), now it supports:

  • kernel.shm_rmid_forced,
  • net.ipv4.ip_local_port_range,
  • net.ipv4.tcp_syncookies.

This list will be extended in future Kubernetes versions when the kubelet supports better isolation mechanisms.

All safe sysctls are enabled by default (you can use it directly without additional configuration in kubelet).

All unsafe sysctls are disabled by default and must be allowed manually by the cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be scheduled, but will fail to launch:

If you describe the failed pod, you get:

A number of sysctls are namespaced in today’s Linux kernels. This means that they can be set independently for each pod on a node. Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.

The following sysctls are known to be namespaced. This list could change in future versions of the Linux kernel.

  • kernel.shm*
  • kernel.msg*
  • kernel.sem
  • fs.mqueue.*
  • net.*

Sysctls with no namespace are called node-level sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.

As with node-level sysctls it is recommended to use taints and toleration feature or taints on nodes to schedule those pods onto the right nodes.

Use the pod securityContext to configure namespaced sysctls. The securityContext applies to all containers in the same pod.

Configure kubelet

If you need to use unsafe sysctls, configure kubelet in target node (configure the node that the unsafe sysctls pod will reside) is a must. Go to edit 10-kubeadm.conf file in /etc/systemd/system/kubelet.service.d/, add

1
Environment="KUBELET_UNSAFE_SYSCTLS=--allowed-unsafe-sysctls='kernel.shm*,kernel.sem,kernel.msg*'"

Here I need kernel.shm*, kernel.sem and kernel.msg*.

then run:

1
2
systemctl daemon-reload
systemctl restart kubelet

verify changes, you can see --allowed-unsafe-sysctls is there:

1
ps aux | grep kubelet

A brief digress: the kubelet service unit file is in /etc/systemd/system/kubelet.service.

Then you can edit YAML file to add sysctls option:

Sometimes you need to disable hostIPC, if not you will get this problem:

After things done, get into the container to check the kernel parameter vaule, for example:

1
sysctl -a | grep -i kernel.sem

Resources

kubernetes 1.4 new feature: support sysctls configure kernel parameters in k8s cluster

0%