top
command ran in container within the pod shows the host machine overview metrics and container level process metrics. The reason is containers inside pod partially share /proc
with the host system includes path about a memory and CPU information. The top
utilizes /proc/stat
(host machine), /proc/<pid>/stat
(container process), they are not aware of the namespace
.
P.S: lxcfs this FUSE filesystem can create container native /proc
! Make container more likes a VM.
The two methods below collect data from different sources and they are also referring to different metrics.
For k8s OOMKiller event, using kubectl top
to predicate and track is more accurate.
Kubectl Top
K8s OOMkiller uses container_memory_working_set_bytes
(from cadviosr metrics, can also show in prometheus if deployed) as base line to decide the pod kill or not. It is an estimate of how much memory cannot be evicted, the kubectl top
uses this metrics as well.
After metrics-server is installed:
1 | # show all containers resource usage insdie a pod |
In prometheus expression browser, you can get the same value as kubectl top
:
1 | # value in Mib |
看看Prom alert是从何处取得 container/pod memory 数据的,注意数据源是来自哪个 metrics,以及使用 grafana 也是如此.
Docker Stats
docker stats
memory display collects data from path /sys/fs/cgroup/memory
with some calculations, see below explanation.
On host machine, display the container stats (CPU and Memory usages)
1 | # similar to top |
Actually docker CLIs fetch data from Docker API, for instance v1.41
(run docker version
to know API supported verion), you can get stats data by using curl command:
1 | curl --unix-socket /var/run/docker.sock "http:/v1.41/containers/<container id>/stats?stream=false" | jq |
1 | "memory_stats": { |
From this docker stats description:
On Linux, the Docker CLI reports memory usage by subtracting cache usage from the total memory usage. The API does not perform such a calculation but rather provides the total memory usage and the amount from the cache so that clients can use the data as needed. The cache usage is defined as the value of total_inactive_file
field in the memory.stat
file on cgroup v1 hosts.
On Docker 19.03
and older, the cache usage was defined as the value of cache
field. On cgroup v2 hosts, the cache usage is defined as the value of inactive_file
field.
memory_stats.usage is from /sys/fs/cgroup/memory/memory.usage_in_bytes
.
memory_stats.stats.inactive_file is from /sys/fs/cgroup/memory/memory.stat
.
So here it is:
1 | 80388096 - 17829888 = 62558208 => 59.66s Mib |
This does perfectly match docker stats
value in MEM USAGE
column.
The
dockershim
is deprecated in k8s!! Ifcontainerd
runtime is used instead, to explore metrics usage you can checkcgroup
in host machine or go into container check/sys/fs/cgroup/cpu
.
To calculate the container memory usage as docker stats
in the pod without installing third party tool:
1 | # memory in Mib: used |
To calculate the container cpu usage as docker stats
in the pod without installing third party tool:
1 | # cpu, cpuacct dir are softlinks |
Readings
-
How much is too much? The Linux OOMKiller and “used” memory We can see from this experiment that
container_memory_usage_bytes
does account for some filesystem pages that are being cached. We can also see that OOMKiller is trackingcontainer_memory_working_set_bytes
. This makes sense as shared filesystem cache pages can be evicted from memory at any time. There’s no point in killing the process just for using disk I/O. -
Kubernetes top vs Linux top kubectl top shows metrics for a given pod. That information is based on reports from
cAdvisor
, which collects real pods resource usage. -
cAdvisor: container advisor cAdvisor (Container Advisor, go project) provides container users an understanding of the resource usage and performance characteristics of their running containers.