Linux LPIC-1 Training

//TODO 这篇总结是来自PluralSight上的LPIC-1课程的Essential章节。 备注:2020年4月份pluralsight在搞活动,免费注册学习!这次lock down是个机会补补课。

Environment: CentOS 7 Enterprise Linux or RedHat.

Essentials

Reading OS data

1
2
3
4
5
6
7
8
9
# system version
# softlink actually
cat /etc/os-release
cat /etc/system-release
cat /etc/redhat-release

# kernel release number
uname -r
cat /proc/version

Shutdown

Send message to others

1
2
3
4
5
6
# send to individual user terminal
write dsadm
> xxx

# send to all user in terminals
wall < message.txt

Shutdown system and prompt

1
2
3
4
5
6
# reboot now
shutdown -r now
# halt/poweroff in 10 mins and use wall send message to login users
shutdown -h 10 "The system is going down in 10 min"
# cancel shutdown
shutdown -c

Changing runlevels what is runlevel in linux? https://www.liquidweb.com/kb/linux-runlevels-explained/ 比如 runlevel 1 就只能root user且没有network enabled,也叫作rescue.target,可以做一些需要隔离的操作。 runlevel 3 是默认的multi-user + network enabled (多数情况是这个状态) runlevel 5 是Desktop interface + runlevel 3的组合。

1
2
3
4
5
6
7
8
9
# show current runlevel
who -r
runlevel

# different systemd daemon can have differet target runlevel
# default runlevel
systemctl get-default
# set default runlevel
systemctl set-default multi-user.target

More about systemd, see my systemd blog.

Manage processes

1
2
3
4
5
6
7
8
9
10
11
12
# show process on current shell
# use dash is UNIX options
ps -f
# -e means all processes
ps -ef --forest
# -F show full format column
ps -F -p $(pgrep sshd)
# kill all sleep processes
pkill sleep

# BSD options
ps aux

$$ the PID of current running process

1
2
3
4
5
6
7
cd /proc/$$

# we can interrogate this directory
# current dir
ls -l cwd
# current exe
ls -l exe

top 命令的options还记得吗? 比如切换memory显示单位,选择排序的依据CPU/MEM occupied…

Process priority

if something runs in foreground and prevent you from doing anything, use ctrl+z to suspend it (still in memory, not takeing CPU time), then put it in background.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
sleep 10000
^Z
[1]+ Stopped sleep 10000

# use job command, `+` means current focus
jobs
[1]+ Stopped sleep 10000

# use bg command to put current focus in background
bg
[1]+ sleep 10000 &

# check is running in background
jobs
[1]+ Running sleep 10000 &

# use fg will bring current focus to foreground again

如果你在一个bash shell中sleep 1000& 然后exit bash shell,则这个sleep process will hand over to init process. can check via ps -F -p $(pgrep sleep), 会发现PPID是1了。进入另一个bash shell jobs 并不会显示之前bash shell的background process.

1
2
3
4
5
6
7
8
# show PRI(priority) and NI(nice) number
ps -l

F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME CMD
4 S 0 23785 23781 0 80 0 - 28891 do_wai pts/1 00:00:00 bash
0 S 0 24859 23785 0 80 0 - 26987 hrtime pts/1 00:00:00 sleep
0 S 0 24861 23785 0 80 0 - 26987 hrtime pts/1 00:00:00 sleep
...

PRI value for real time is from [60,99] and [100,139] for users, the bigger the better. NI value is from [-20,19], higher the nicer so less CPU time to take. 在相同PRI 之下,NI 决定了多少资源.

比如说你有一个build task并不urgent, 不想它在后台占用太多资源,可以设置nice value.

1
2
3
4
# set nice value to 19
nice -n 19 sleep 1000 &
# reset nice value
renice -n 10 -p <pid>

要注意的是只有root可以设置负数nice value和降低nice value. root可以去vim /etc/security/limits.conf设置对不同user/group的nice value。

Monitor linux performance

这个很重要,一般关注网络,硬盘,CPU

List content of the package procps-ng, procps is the package that has a bunch of small useful utilities that give information about processes using the /proc filesystem. The package includes the programs ps, top, vmstat, w, kill, free, slabtop, and skill.

1
2
3
4
5
6
7
8
9
10
11
12
13
# see executable files under procps package via rpm
rpm -ql procps-ng | grep "^/usr/bin/"

/usr/bin/free
/usr/bin/pgrep
/usr/bin/pkill
/usr/bin/pmap
...

# check the source package of top command
rpm -qf $(which top)

procps-ng-3.3.10-17.el7_5.2.x86_64

Introduce 2 new commands: pmap and pwdx

1
2
3
4
5
6
7
8
9
# pmap, show memory map of a process
# for example, current running process
pmap $$
# you can also see shared libary been used by the process

# show current working directory of process
pwdx $$
pwdx $(pgrep sshd)
# actually the output is from /proc/<pid>/cwd, it is a softlink

Load average analysis

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# check how long the system has been running
# load average is not normalized for cpu number 如果你知道CPU有多少个
# 根据load average就能看出是不是很忙, 如果load average的值超出了CPU个数
# 则说明需要queue or wait
# 这个命令其实是从/proc/uptime, /proc/loadavg 来的数据
uptime
18:53:14 up 39 days, 3:50, 1 user, load average: 0.00, 0.01, 0.05

# check how many cpu
# the number of cpu is equal to processor number
# but you may have less cores, see /proc/cpuinfo
lscpu

# the same as w
w
18:59:29 up 12 days, 23:40, 3 users, load average: 0.04, 0.26, 0.26
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 9.160.1.111 08:47 6:46m 0.03s 0.03s -bash
...

监控load or output

1
2
3
4
5
6
7
# execute a program periodically, showing output fullscreen
# 这里的例子是每隔4秒 运行 uptime
watch -n 4 uptime

# graphic representation of system load average
# 如果此时运行一个tar,会看到loadavg显著变化
tload
1
2
3
4
5
6
7
8
9
10
11
12
# -b 使用batch mode 输出所有process情况
# -n2 运行2回合
top -b -n2 > file.txt

# run 3 time, gap 5 seconds
# reports information about processes, memory, paging, block IO, traps, disks and cpu activity
vmstat 5 3
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
1 0 520 90576 4176 1601932 0 0 4 188 18 19 0 1 93 4 2
0 0 520 90460 4176 1601956 0 0 0 46 514 348 0 0 98 2 1
0 0 520 88972 4176 1603692 0 0 0 542 707 589 0 1 97 2 1

sysstat toolkit

The package contains many performance measurement tools. Install sysstat (a bunch of command: iostat, netstat, etc).

1
2
3
4
5
6
7
8
9
10
11
12
13
yum install -y sysstat

# then check executable
rpm -ql | grep "^/usr/bin"

/usr/bin/cifsiostat
/usr/bin/iostat
/usr/bin/mpstat
/usr/bin/nfsiostat-sysstat
/usr/bin/pidstat
/usr/bin/sadf
/usr/bin/sar
/usr/bin/tapestat

The config file for sysstat can be found by:

1
2
3
# -q: query
# -c: config file
rpm -qc sysstat

在安装后,其实用的cron在背后操作收集数据, configuration is in file cat /etc/sysconfig/sysstat,这里面可以设置记录的周期,默认是28天。

1
2
3
4
5
6
7
8
# cron config for sysstat
cat /etc/cron.d/sysstat

# Run system activity accounting tool every 10 minutes
*/10 * * * * root /usr/lib64/sa/sa1 1 1
# 0 * * * * root /usr/lib64/sa/sa1 600 6 &
# Generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A

start and enable:

1
2
systemctl start sysstat
systemctl enable sysstat

来看看sysstat下的工具命令:

1
2
3
4
5
6
# show in mega byte
# run 3 times 5 seconds in between
iostat -m 5 3
# others
pidstat
mpstat

Let’s see sar(system activity report), gather statistics and historical data, 通过分析一天的bottleneck(cpu/memory/disk/network/loadavg)可以更好的schedule任务,比如发现某个时间cpu, memory的使用比较多。这里并没有深入讲解怎么解读这些数据,并且你需要了解各个部分数据的含义,以及什么样的数据可能是异常.

sar的数据在/var/log/sa里面,每天一个文件,周期性覆盖。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# sar specific processor, cpu 0/cpu1
# check %idle
sar -P 0/1

# default show CPU utilization
# %user: user space stuff
# %system: sytem space stuff
sar -u
# interval 1sec and show 5 times
sar -u ALL 1 5

# show memory utilization
sar -r

# show disk utilization
sar -b

# network activity
sar -n DEV

# load average
# interval 5sec and show 2 times
sar -q 5 2

# 显示sa23这天的文件,从18:00:00到19:00:00
sar -n DEV -s 18:00:00 -e 19:00:00 -f /var/log/sa/sa23

图形化sar数据,可以用ksar:https://www.cyberciti.biz/tips/identifying-linux-bottlenecks-sar-graphs-with-ksar.html

Log and logrotate

Auditing login events,这个还挺有用的,看哪个user什么时候login了, w是查看当前哪些user正在使用中。

1
2
3
4
5
6
7
8
9
10
11
12
# see user login info
lastlog | grep -v "Never"

Username Port From Latest
root pts/0 9.65.239.28 Fri Apr 24 17:51:48 -0700 2020
fyre pts/0 Fri Apr 24 17:52:00 -0700 2020

# check system reboot info
# The last command reads data from the wtmp log and displays it in a terminal window.
last reboot
# check still login user, the same as `w`
last | grep still

Auditing root access,看su/sudo的使用情况,在/var/log/secure文件中,这里其实有多个secure文件,有日期区分。

1
2
3
4
5
# there are some secure and auditing files
cd /var/log
# secure file
# 当然有grep也行,把用sudo的事件找出来
awk '/sudo/ { print $5, $6, $14 }' secure

我会专门总结一下awk的笔记,这个挺有用的。

journalctl是一个常用的system log查询工具。当时查看一些docker的log在里面也能看到。

1
2
3
4
5
6
7
8
9
# show last 10 lines
journalctl -n 10
# ses real time appending
journalctl -f
# -u: systemd unit
journalctl -u sshd
# timestamp
journalctl --since "10 minutes ago"
journalctl --since "2020-04-26 13:00:00"

Selinux

O’Reilly有过相关的课程,在我工作邮件中连接还在。目前只需要知道什么是selinux,如何打开,关闭它即可。 SELINUX= can take one of these three values: enforcing - SELinux security policy is enforced. permissive - SELinux prints warnings instead of enforcing. disabled - No SELinux policy is loaded.

1
2
3
4
# see if selinux is permissive, enforcing or disabled
getenforce
# more clear
sestatus

如果最开始是disabled的,则要去config file /etc/selinux/config 设置permissive,然后重启。 不能setenforce 去disable,也只能在config文件中disable然后重启机器。

1
2
3
4
# setenforce [ Enforcing 1| Permissive 0]
# 成为permissive后就可以用setenforce切换了,但都不是永久的
setenforce 0
setenforce 1

显示selinux的labels, flag Z对于其他命令也有用。

1
2
3
4
5
6
7
8
9
10
11
# see user selinux config
id -Z
# user, role, type
unconfined_u:unconfined_r:unconfined_t:s0-s0:c0.c1023
# see files selinux config
/bin/ls -Z
# see process selinux config
ps -Zp $(pgrep sshd)
LABEL PID TTY STAT TIME COMMAND
system_u:system_r:kernel_t:s0 968 ? Ss 0:00 /usr/sbin/sshd -D
unconfined_u:unconfined_r:unconfined_t:s0 1196 ? Ss 0:00 sshd: root@pts/0
0%