Benchmark
Need to first install stress
, stress-ng
packages, benchmark with multi-process.
1 | # stress cpu with 1 process |
Need to install sysbench
, benchmark with multi-thread.
1 | # 以10个线程运行5分钟的基准测试,模拟多线程切换的问题 |
Send TCP/IP packets, for network, firewall check.
1 | yum install hping3 -y |
Another useful benchmark tool is iperf3
to measure various network performance in a server-client mode. Search iperf3 Command
for more details.
Analysis
Need to install perf
.
1 | # similar to top, real time cpu usage display |
CPU
1 | # boot time, load average(runnable/running + uninterruptable IO), user |
Versatile tool for generating system resource statistics
1 | # combination of cpu, disk, net, system |
Context switch check:
1 | # process context switch metrics |
Interrupts check:
1 | # hard interrupts |
Memory
To check process memory usage, using top
(VIRT, RES, SHR) and ps
(VSZ, RSS)
1 | # adjust oom score [-17,15], the higher the kill-prone |
Check OOM killed process:
1 | dmesg |grep -iE "kill|oom|out of memory" |
1 | # check memory |
Check cache hits (need to install from BCC):
1 | # system overall cache hit |
Check memory leak:
1 | # or valgrind |
If swap is enabled, we can adjust the swappiness:
1 | # [0, 100], the higher the swappiness-prone |
As oppose to swappiness, another reclamation is for file-backed page from cache/buffer.
Release caches, used carefully in production:
1 | # sync: flush dirty pages to disk |
Check kernel slab details:
1 | # man slabinfo |
I/O
- Check
top
for overall iowait performance - Check
iostat/sar
for device performance - Check
pidstat
for outstanding I/O process/thread - Check
strace
for system read/write calls - Check
lsof
for process/thead operating files
System proc
files correlates to I/O:
- /proc/slabinfo
- /proc/meminfo
- /proc/diskstats
- /proc/pid/io
Check disk space usage and inode usage
1 | # -T: file system type |
Check overall device statistics:
1 | # -d: device report |
Check process I/O status
1 | # -d: io status |
Check system calls on I/O to locate files:
1 | # -f: threads |
Also search and check <<Linux Check Disk Space>>
for lsof
usage.
And <<Linux Storage System>>
to manage disk storage.
And <<Linux Make Big Files>>
to make big file for testing.
Other BCC tools useful:
1 | # trace file read/write |
Network
System kind errors:
1 | # not only used for network but for general purpose |
sar
network related commands:
1 | # -n: statistics of network device |
Network stack statistics:
1 | # see tcp, udp numeric listening |
Network sniffing, see another blog <<Logstash UDP Input Data Lost>>
for more tcpdump usage. Last resort and expensive, check if mirror traffic is available in production.
1 | # -i: interface |
For example, tcpdump to pcap file and analyzed by wireshark later, using rotated or timed file to control the file size. Don’t force kill the tcpdump process because that will corrupt the pcap file.
UDP statistics
1 | # -s: summary statistics for each protocol |
For example, the receive buffer errors
increases frequently usually means UDP packets dropping and needs to increase socket receiving buffer size or app level buffer/queue size.
Simulate packet loss for inbound(iptables) and outbound traffic(tc-netem), check this post for detail.