Kubernetes version 1.13.2
In my article <<Linux IPC>>
, I mentioned that there is a workaround to set IPC kernel parameters using sysctl
in Kubernetes cluster if SYS_RESOURCE
is not allowed.
Clarification
From the Kubernetes document, we see:
Sysctls are grouped into safe and unsafe sysctls. This means that setting a safe sysctl for one pod:
- must not have any influence on any other pod on the node
- must not allow to harm the node’s health
- must not allow to gain CPU or memory resources outside of the resource limits of a pod.
By far, most of the namespaced sysctls are not necessarily considered safe (please check latest Kubernetes document to figure out), now it supports:
- kernel.shm_rmid_forced,
- net.ipv4.ip_local_port_range,
- net.ipv4.tcp_syncookies.
This list will be extended in future Kubernetes versions when the kubelet supports better isolation mechanisms.
All safe
sysctls are enabled by default (you can use it directly without additional configuration in kubelet).
All unsafe
sysctls are disabled by default and must be allowed manually by the cluster admin on a per-node basis. Pods with disabled unsafe sysctls will be scheduled, but will fail to launch:
If you describe the failed pod, you get:
A number of sysctls are namespaced
in today’s Linux kernels. This means that they can be set independently for each pod on a node. Only namespaced sysctls are configurable via the pod securityContext within Kubernetes.
The following sysctls are known to be namespaced. This list could change in future versions of the Linux kernel.
- kernel.shm*
- kernel.msg*
- kernel.sem
- fs.mqueue.*
- net.*
Sysctls with no namespace are called node-level
sysctls. If you need to set them, you must manually configure them on each node’s operating system, or by using a DaemonSet with privileged containers.
As with node-level sysctls it is recommended to use taints and toleration feature or taints on nodes to schedule those pods onto the right nodes.
Use the pod securityContext to configure namespaced sysctls. The securityContext applies to all containers in the same pod.
Configure kubelet
If you need to use unsafe sysctls, configure kubelet in target node (configure the node that the unsafe sysctls pod will reside) is a must. Go to edit 10-kubeadm.conf
file in /etc/systemd/system/kubelet.service.d/
, add
1 | Environment="KUBELET_UNSAFE_SYSCTLS=--allowed-unsafe-sysctls='kernel.shm*,kernel.sem,kernel.msg*'" |
Here I need kernel.shm*
, kernel.sem
and kernel.msg*
.
then run:
1 | systemctl daemon-reload |
verify changes, you can see --allowed-unsafe-sysctls
is there:
1 | ps aux | grep kubelet |
A brief digress: the kubelet service unit file is in
/etc/systemd/system/kubelet.service
.
Then you can edit YAML file to add sysctls
option:
Sometimes you need to disable hostIPC
, if not you will get this problem:
After things done, get into the container to check the kernel parameter vaule, for example:
1 | sysctl -a | grep -i kernel.sem |
Resources
kubernetes 1.4 new feature: support sysctls configure kernel parameters in k8s cluster