Linux Clean Memory Cache

When deploying DS, I find the compute pod that assigned to the second node is hanging in CreateContainer status. I SSH into that node and find node memory is occupied heavily by some other processes so the command response is extremely slow(其实ssh laggy是因为%CPU的原因,ssh需要和其他进程竞争), and the %CPU is also high with the swapping daemon(由于当时没记录,我猜测可能是[kswapd] daemon, 并且 [kswapd] 并不是仅仅针对swap工作,也会做释放缓存的操作,结合%CPU被其占用,估计是在持续做释放缓存的操作,但失败了).

Some informative readings:

The memory usage on bad node:

1
2
3
4
free
total used free shared buff/cache available
Mem: 8168772 105152 295732 7491216 7767888 273448
Swap: 0 0 0

See the available size is too low, Comparing with the good node:

1
2
3
4
free
total used free shared buff/cache available
Mem: 8168772 123504 7041388 270836 1003880 7424612
Swap: 0 0 0

You see the shared(Memory used mostly by tmpfs) and buff/cache parts are huge on bad node, I need to flush and clean it. (当时还未理解这些column的具体含义, 特别是shared, buff/cache, available). 还应该用memory leak相关分析工具去查看哪个调用栈导致了内存紧张 or memleak).

马后炮, 应该去调查为什么这个node shared, buff/cache size 异常的高,以及为什么[kswapd]操作失败,原因可能是tmpfs(shared)仍然在被使用,需要调查. 后面介绍了如何释放cache by writing to drop_caches,但是我记得当时作用不大,this second comment may help: Why can’t I release memory cache by /proc/sys/vm/drop_caches

1
2
3
4
# -t: type
df -t tmpfs --total -h
# check one of the tmpfs mount status
lsof -nP +L1 /dev/shm | grep DEL

The general solution to release cache intentionally, this post is good to reference. If you have to clear the disk cache, this command is safest in enterprise and production, will clear the PageCache only:

1
2
# modify kernel behavior by proc file
sync; echo 1 > /proc/sys/vm/drop_caches

What is sync command: flush any data buffered in memory out to disk.

More aggressively, Clear dentries and inodes:

1
sync; echo 2 > /proc/sys/vm/drop_caches

Clear PageCache, dentries and inodes:

1
2
3
# 清理文件页、目录项、Inodes等各种缓存
# do all above
sync; echo 3 > /proc/sys/vm/drop_caches

It is not recommended to use this in production until you know what you are doing, as it will clear PageCache, dentries and inodes. Because just after your run drop_caches, your server will get busy re-populating memory with inodes and dentries, original Kernel documentation recommends not to run this command outside of a testing or debugging environment. But what if you are a home user or your server is getting too busy and almost filling up it’s memory. You need to be able trade the benefits with the risk.

  • what is dirty cache? Dirty Cache refers to data which has not yet been committed to the database (or disk), and is currently held in computer memory. In short, the new/old data is available in Memory and it is different to what you have in database/disk.

  • what is clean cache? Clean cache refers to data which has been committed to database (or disk) and is currently held in computer memory. This is what we desire where everything is in sync.

  • what is dentries and inodes? A filesystem is represented in memory using dentries and inodes. Inodes are the objects that represent the underlying files (and also directories). A dentry is an object with a string name (d_name), a pointer to an inode (d_inode), and a pointer to the parent dentry (d_parent)

  • what is drop_caches? Writing to this will cause the kernel to drop clean caches, as well as reclaimable slab objects like dentries and inodes. Once dropped, their memory becomes free. It will not kill any process.

BTW, if you want to clean swap space (actually here we don’t have swap enabled):

1
swapoff -a && swapon -a
0%