On-Call System Performance

Benchmark

Need to first install stress, stress-ng packages, benchmark with multi-process.

# stress cpu with 1 process
stress --cpu 1 --timeout 600
# stress cpu with 8 processes
stress -c 8 --timeout 600

# stress io
# stress -i 1 --timeout 600 does not work well
# because VM sync buffer is small
stress-ng --io 1 --hdd 1 --timeout 600

Need to install sysbench, benchmark with multi-thread.

1 2	# 以10个线程运行5分钟的基准测试，模拟多线程切换的问题 sysbench --threads=10 --max-time=300 threads run

Send TCP/IP packets, for network, firewall check.

yum install hping3 -y

# -S: TCP SYN
# -p: target port
# -i: interval, u100: 100 microsecond
hping3 -S -p 80 -i u100 192.168.0.30

Another useful benchmark tool is iperf3 to measure various network performance in a server-client mode. Search iperf3 Command for more details.

Analysis

Need to install perf.

# similar to top, real time cpu usage display
# Object: [.] userspace, [k] kernal
perf top
# -g: enables call-graph (stack chain/backtrace) recording.
perf top -g -p <pid>

# record profiling and inspect later
# -g: enables call-graph (stack chain/backtrace) recording.
perf record -g
perf report
# record can also help find short live process

CPU

# boot time, load average(runnable/running + uninterruptable IO), user
uptime
w

# -c: show command line
# -b: batch mode
# -n: iteration
top -c -b -n 1 | head -n 1

# -d: highlight the successive difference
watch -d "uptime"

# overall system metrics
# focus on in, cs, r, b, check man for description
# 注意r 的个数是否远超CPU 个数
# in 太多也是个问题
# us sy 看cpu 主要是被用户 还是 内核 占据
vmstat -w -S m 2

# cpu core number
lscpu
## press 1 to see cpus list
top

# check all cpus metrics
# 判断cpu usage 升高是由于iowait 还是 computing
mpstat -P ALL 1

# check which process cause cpu utilization high
# -u: cpu status: usr, sys, guest, total
pidstat -u 1

# short live process check
perf top
execsnoop

Versatile tool for generating system resource statistics

# combination of cpu, disk, net, system
# when CPU iowait high, can use it to compare
# iowait vs disk read/wirte vs network rec/send
dstat

Context switch check:

# process context switch metrics
# -w: context switch
# cswch: voluntary, 系统资源不足时，就会发生自愿上下文切换
# nvcswch: non-voluntary, 大量进程都在争抢 CPU 时，就容易发生非自愿上下文切换
# -t: thread
pidstat -t -w 1 -p <pid number>

Interrupts check:

# hard interrupts
# RES: 重调度中断
watch -d cat /proc/interrupts

# acculumated soft interrupt 
# big change *rate* is usually from:
# RCU: kernel ready-copy update lock
# NET_TX
# NET_RX
# TIMER
# SCHED
watch -d cat /proc/softirqs
# soft interrupt kernel thread
# [ksoftirqd/<CPU id>]
ps aux | grep ksoftirq

# if softiq NET_RX/NET_TX is too high
# -n DEV: statistics of network device
# PPS: rxpck/s txpck/s    
# BPS: rxkB/s  txkB/s
sar -n DEV 1

Memory

To check process memory usage, using top(VIRT, RES, SHR) and ps(VSZ, RSS)

1 2	# adjust oom score [-17,15], the higher the kill-prone echo -16 > /proc/$(pidof <process name>)/oom_adj

Check OOM killed process:

1	dmesg \|grep -iE "kill\|oom\|out of memory"

# check memory
# -h: readable
# -w: wide display
free -hw

# buffer is from /proc/meminfo Buffers
cat /proc/meminfo | grep -E "Buffers"
# cache is from /proc/meminfo Cached + SReclaimable
cat /proc/meminfo | grep -E "SReclaimable|Cached"
# understand what is buffer and cache, man proc
# --Buffers: 
# Relatively temporary storage for raw disk blocks that shouldn't get tremendously large (20MB or so)
# --Cached:
# In-memory cache for files read from the disk (the page cache).  Doesn't include SwapCached
# --Slab:
# In-kernel data structures cache.
# --SReclaimable:
# Part of Slab, that might be reclaimed, such as caches.

# -w: wide display
# -S: unit m(mb)
# 2: profile interval
vmstat -w -Sm 2

Check cache hits (need to install from BCC):

# system overall cache hit
cachestat
# process level cache hit
cachetop

Check memory leak:

1
2
3

# or valgrind
# in bcc-tools with cachestat and cachetop
memleak -a -p $(pidof app_name)

If swap is enabled, we can adjust the swappiness:

1
2
3

# [0, 100], the higher the swappiness-prone
# reclaim anonymous page from heap
echo 90 > /proc/sys/vm/swappiness

As oppose to swappiness, another reclamation is for file-backed page from cache/buffer.

Release caches, used carefully in production:

1
2
3

# sync: flush dirty pages to disk
# use carefully, drop both inode and dentry cache
sync; echo <1 or 2 or 3> > /proc/sys/vm/drop_caches

Check kernel slab details:

# man slabinfo
# pay attendtion to dentry and inode_cache
cat /proc/slabinfo | grep -E '^#|dentry|inode'

# real time kernel slab usage
# -s c: sort by cache size
slabtop -s c

I/O

Check top for overall iowait performance
Check iostat/sar for device performance
Check pidstat for outstanding I/O process/thread
Check strace for system read/write calls
Check lsof for process/thead operating files

System proc files correlates to I/O:

/proc/slabinfo
/proc/meminfo
/proc/diskstats
/proc/pid/io

Check disk space usage and inode usage

# -T: file system type
df -hT
df -hT /dev/sda2

# -i: inode
df -ih

# directory storage size
# -s: summary
# -c: total
du -sc * | sort -nr

Check overall device statistics:

# -d: device report
# -x: extended fields
iostat -dx 1
# pay attention to
# %util: 磁盘使用率
# r/s,w/s: IOPS
# rKB/s, wKB/s: throughput
# r_await,w_await: delay

# -d: disk report
# -p: pretty print
# tps != IOPS
sar -dp 1

Check process I/O status

# -d: io status
# -p: pid
# -t: thread
pidstat -d [-t] -p <pid number> 1

# simple top-like I/O monitor
# -b: batch mode
# -n: iteration number
# -o: only show actually doing I/O processes/threads
# -P: only show process
iotop -b -n 1 -o [-P]

Check system calls on I/O to locate files:

# -f: threads
# -T: execution time
# -tt: system timestamp
# any read/write operations?
strace [-f] [-T] [-tt] -p <pid>

# check files opened by process
lsof -p <pid>

Also search and check <<Linux Check Disk Space>> for lsof usage. And <<Linux Storage System>> to manage disk storage. And <<Linux Make Big Files>> to make big file for testing.

Other BCC tools useful:

# trace file read/write
filetop
# trace kernel open system call
opensnoop

Network

System kind errors:

1
2
3

# not only used for network but for general purpose
# -e: show local timestamp
dmesg -e | tail

sar network related commands:

# -n: statistics of network device
# DEV: network devices statistic
sar -n DEV 1

# see man sar for details
sar -n UDP 1
# ETCP: statistics about TCPv4 network errors
sar -n ETCP 1
# EDEV: statistics on failures (errors) from the network devices
sar -n EDEV 1

Network stack statistics:

# see tcp, udp numeric listening 
# -p: PID and name of the program to which each socket belongs.
netstat -tunlp

# check tcp connection status
# LISTEN/ESTAB/TIME-WAIT, etc
# -a: display all sockets
# -n: no reslove service name
# -t: tcp sockets
ss -ant | awk 'NR>1 {++s[$1]} END {for(k in s) print k,s[k]}'

# check interface statistics
ip -s -s link

Network sniffing, see another blog <<Logstash UDP Input Data Lost>> for more tcpdump usage. Last resort and expensive, check if mirror traffic is available in production.

# -i: interface
# -nn: no resolution
# tcp port 80 and src 192.168.1.4: filter to reduce kernel overhead
tcpdump -i eth0 -nn tcp port 80 and src 192.168.1.4 -w log.pcap

For example, tcpdump to pcap file and analyzed by wireshark later, using rotated or timed file to control the file size. Don’t force kill the tcpdump process because that will corrupt the pcap file.

UDP statistics

# -s: summary statistics for each protocol
# -u: UDP statistics
# for example 'receive buffer errors' usually indicates UDP packet dropping
watch -n1 -d netstat -su

For example, the receive buffer errors increases frequently usually means UDP packets dropping and needs to increase socket receiving buffer size or app level buffer/queue size.

Simulate packet loss for inbound(iptables) and outbound traffic(tc-netem), check this post for detail.